1
|
Chen S, He Y, Lv L, Liu B, Li C, Deng H, Xu J. Transient chemical-mediated epigenetic modulation confers unrestricted lineage potential on human primed pluripotent stem cells. SCIENCE CHINA. LIFE SCIENCES 2025; 68:1084-1101. [PMID: 39825205 DOI: 10.1007/s11427-024-2660-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Accepted: 06/19/2024] [Indexed: 01/20/2025]
Abstract
Human primed pluripotent stem cells are capable of generating all the embryonic lineages. However, their extraembryonic trophectoderm potentials are limited. It remains unclear how to expand their developmental potential to trophectoderm lineages. Here we show that transient treatment with a cocktail of small molecule epigenetic modulators imparts trophectoderm lineage potentials to human primed pluripotent stem cells while preserving their embryonic potential. These chemically treated cells can generate trophectoderm-like cells and downstream trophoblast stem cells, diverging into syncytiotrophoblast and extravillous trophoblast lineages. Transcriptomic and CUT&Tag analyses reveal that these induced cells share transcriptional profiles with in vivo trophectoderm and cytotrophoblast, and exhibit reduced H3K27me3 modification at gene loci specific to trophoblast lineages compared with primed pluripotent cells. Mechanistic exploration highlighted the critical roles of epigenetic modulators HDAC2, EZH1/2, and KDM5s in the activation of trophoblast lineage potential. Our findings demonstrate that transient epigenetic resetting activates unrestricted lineage potential in human primed pluripotent stem cells, and offer new mechanistic insights into human trophoblast lineage specification and in vitro models for studying placental development and related disorders.
Collapse
Affiliation(s)
- Shi Chen
- Department of Cell Biology, School of Basic Medical Sciences, Peking University Stem Cell Research Center, Peking University Health Science Center, Peking University, Beijing, 100191, China
| | - Yuanyuan He
- Academy of Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Lejun Lv
- BeiCell Therapeutics, Beijing, 100094, China
| | - Bei Liu
- BeiCell Therapeutics, Beijing, 100094, China.
| | - Cheng Li
- School of Life Sciences, Center for Bioinformatics, Center for Statistical Science, Peking University, Beijing, 100871, China.
| | - Hongkui Deng
- MOE Engineering Research Center of Regenerative Medicine, School of Basic Medical Sciences, State Key Laboratory of Natural and Biomimetic Drugs, Peking University Health Science Center and the MOE Key Laboratory of Cell Proliferation and Differentiation, College of Life Sciences, Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, 100871, China.
| | - Jun Xu
- Department of Cell Biology, School of Basic Medical Sciences, Peking University Stem Cell Research Center, Peking University Health Science Center, Peking University, Beijing, 100191, China.
| |
Collapse
|
2
|
Zheng GM, Wu JW, Li J, Zhao YJ, Zhou C, Ren RC, Wei YM, Zhang XS, Zhao XY. The chromatin accessibility landscape during early maize seed development. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2025; 121:e70073. [PMID: 40127931 PMCID: PMC11932762 DOI: 10.1111/tpj.70073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 12/30/2024] [Accepted: 02/18/2025] [Indexed: 03/26/2025]
Abstract
Cis-regulatory elements (CREs) are enriched in accessible chromatin regions (ACRs) of eukaryotes. Despite extensive research on genome-wide ACRs in various plant tissues, the global impact of these changes on developmental processes in maize seeds remains poorly understood. In this study, we employed the assay for transposase-accessible chromatin sequencing (ATAC-seq) to reveal the chromatin accessibility profile throughout the genome during the early stages of maize seed development. We identified a total of 37 952 to 59 887 high-quality ACRs in maize seeds at 0 to 8 days after pollination (DAP). Furthermore, we examined the correlation between the identified ACRs and gene expression. We observed a positive correlation between the open degree of promoter-ACRs and the expression of most genes. Moreover, we identified binding footprints of numerous transcription factors (TFs) within chromatin accessibility regions and revealed key TF families involved in different stages. Through the footprints of accessible chromatin regions, we predicted transcription factor regulatory networks during early maize embryo development. Additionally, we discovered that DNA sequence diversity was notably reduced at ACRs, yet trait-associated SNPs were more likely to be located within ACRs. We edited the ACR containing the trait-associated SNP of NKD1. Both NKD1pro-1 and NKD1pro-2 showed phenotypes corresponding to the trait-associated SNP. Our results suggest that alterations in chromatin accessibility play a crucial role in maize seed development and highlight the potential contribution of open chromatin regions to advancements in maize breeding.
Collapse
Affiliation(s)
- Guang Ming Zheng
- State Key Laboratory of Crop Biology, College of Life SciencesShandong Agricultural UniversityTaianShandong271018China
| | - Jia Wen Wu
- State Key Laboratory of Crop Biology, College of Life SciencesShandong Agricultural UniversityTaianShandong271018China
| | - Jun Li
- State Key Laboratory of Crop Biology, College of Life SciencesShandong Agricultural UniversityTaianShandong271018China
| | - Ya Jie Zhao
- State Key Laboratory of Crop Biology, College of Life SciencesShandong Agricultural UniversityTaianShandong271018China
| | - Chao Zhou
- State Key Laboratory of Crop Biology, College of Life SciencesShandong Agricultural UniversityTaianShandong271018China
| | - Ru Chang Ren
- State Key Laboratory of Crop Biology, College of Life SciencesShandong Agricultural UniversityTaianShandong271018China
| | - Yi Ming Wei
- State Key Laboratory of Crop Biology, College of Life SciencesShandong Agricultural UniversityTaianShandong271018China
| | - Xian Sheng Zhang
- State Key Laboratory of Crop Biology, College of Life SciencesShandong Agricultural UniversityTaianShandong271018China
| | - Xiang Yu Zhao
- State Key Laboratory of Crop Biology, College of Life SciencesShandong Agricultural UniversityTaianShandong271018China
| |
Collapse
|
3
|
Zhao L, Chen J, Zhang Z, Wu W, Lin X, Gao M, Yang Y, Zhao P, Xu S, Yang C, Yao Y, Zhang A, Liu D, Wang D, Xiao J. Deciphering the Transcriptional Regulatory Network Governing Starch and Storage Protein Biosynthesis in Wheat for Breeding Improvement. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2401383. [PMID: 38943260 PMCID: PMC11434112 DOI: 10.1002/advs.202401383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 05/02/2024] [Indexed: 07/01/2024]
Abstract
Starch and seed storage protein (SSP) composition profoundly impact wheat grain yield and quality. To unveil regulatory mechanisms governing their biosynthesis, transcriptome, and epigenome profiling is conducted across key endosperm developmental stages, revealing that chromatin accessibility, H3K27ac, and H3K27me3 collectively regulate SSP and starch genes with varying impact. Population transcriptome and phenotype analyses highlight accessible promoter regions' crucial role as a genetic variation resource, influencing grain yield and quality in a core collection of wheat accessions. Integration of time-serial RNA-seq and ATAC-seq enables the construction of a hierarchical transcriptional regulatory network governing starch and SSP biosynthesis, identifying 42 high-confidence novel candidates. These candidates exhibit overlap with genetic regions associated with grain size and quality traits, and their functional significance is validated through expression-phenotype association analysis among wheat accessions and loss-of-function mutants. Functional analysis of wheat abscisic acid insensitive 3-A1 (TaABI3-A1) with genome editing knock-out lines demonstrates its role in promoting SSP accumulation while repressing starch biosynthesis through transcriptional regulation. Excellent TaABI3-A1Hap1 with enhanced grain weight is selected during the breeding process in China, linked to altered expression levels. This study unveils key regulators, advancing understanding of SSP and starch biosynthesis regulation and contributing to breeding enhancement.
Collapse
Affiliation(s)
- Long Zhao
- Key Laboratory of Plant Cell and Chromosome EngineeringInstitute of Genetics and Developmental BiologyChinese Academy of SciencesBeijing100101China
- College of Advanced Agricultural SciencesUniversity of Chinese Academy of SciencesBeijing100049China
| | - Jinchao Chen
- Key Laboratory of Plant Cell and Chromosome EngineeringInstitute of Genetics and Developmental BiologyChinese Academy of SciencesBeijing100101China
- College of Advanced Agricultural SciencesUniversity of Chinese Academy of SciencesBeijing100049China
| | - Zhaoheng Zhang
- Key Laboratory of Plant Cell and Chromosome EngineeringInstitute of Genetics and Developmental BiologyChinese Academy of SciencesBeijing100101China
- College of Advanced Agricultural SciencesUniversity of Chinese Academy of SciencesBeijing100049China
| | - Wenying Wu
- Key Laboratory of Plant Cell and Chromosome EngineeringInstitute of Genetics and Developmental BiologyChinese Academy of SciencesBeijing100101China
- College of Advanced Agricultural SciencesUniversity of Chinese Academy of SciencesBeijing100049China
| | - Xuelei Lin
- Key Laboratory of Plant Cell and Chromosome EngineeringInstitute of Genetics and Developmental BiologyChinese Academy of SciencesBeijing100101China
| | - Mingxiang Gao
- State Key Laboratory of North China Crop Improvement and RegulationHebei Agricultural UniversityBaodingHebei071001China
| | - Yiman Yang
- Key Laboratory of Plant Cell and Chromosome EngineeringInstitute of Genetics and Developmental BiologyChinese Academy of SciencesBeijing100101China
- State Key Laboratory of Crop Genetics & Germplasm Enhancement and UtilizationNanjing Agricultural UniversityNanjingJiangsu210095China
| | - Peng Zhao
- Key Laboratory of Plant Cell and Chromosome EngineeringInstitute of Genetics and Developmental BiologyChinese Academy of SciencesBeijing100101China
- State Key Laboratory for Crop Stress Resistance and High‐Efficiency ProductionCollege of AgronomyNorthwest A&F UniversityYangling712100China
| | - Shengbao Xu
- State Key Laboratory for Crop Stress Resistance and High‐Efficiency ProductionCollege of AgronomyNorthwest A&F UniversityYangling712100China
| | - Changfeng Yang
- State Key Laboratory for Agrobiotechnology, Key Laboratory of Crop Heterosis Utilization (MOE)China Agricultural UniversityBeijing100193China
| | - Yingyin Yao
- State Key Laboratory for Agrobiotechnology, Key Laboratory of Crop Heterosis Utilization (MOE)China Agricultural UniversityBeijing100193China
| | - Aimin Zhang
- Key Laboratory of Plant Cell and Chromosome EngineeringInstitute of Genetics and Developmental BiologyChinese Academy of SciencesBeijing100101China
- State Key Laboratory of North China Crop Improvement and RegulationHebei Agricultural UniversityBaodingHebei071001China
| | - Dongcheng Liu
- State Key Laboratory of North China Crop Improvement and RegulationHebei Agricultural UniversityBaodingHebei071001China
| | - Dongzhi Wang
- Key Laboratory of Plant Cell and Chromosome EngineeringInstitute of Genetics and Developmental BiologyChinese Academy of SciencesBeijing100101China
| | - Jun Xiao
- Key Laboratory of Plant Cell and Chromosome EngineeringInstitute of Genetics and Developmental BiologyChinese Academy of SciencesBeijing100101China
- College of Advanced Agricultural SciencesUniversity of Chinese Academy of SciencesBeijing100049China
- Centre of Excellence for Plant and Microbial Science (CEPAMS)JIC‐CASBeijing100101China
| |
Collapse
|
4
|
Paquette A, Ahuna K, Hwang YM, Pearl J, Liao H, Shannon P, Kadam L, Lapehn S, Bucher M, Roper R, Funk C, MacDonald J, Bammler T, Baloni P, Brockway H, Mason WA, Bush N, Lewinn KZ, Karr CJ, Stamatoyannopoulos J, Muglia LJ, Jones H, Sadovsky Y, Myatt L, Sathyanarayana S, Price ND. A genome scale transcriptional regulatory model of the human placenta. SCIENCE ADVANCES 2024; 10:eadf3411. [PMID: 38941464 PMCID: PMC11212735 DOI: 10.1126/sciadv.adf3411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 05/28/2024] [Indexed: 06/30/2024]
Abstract
Gene regulation is essential to placental function and fetal development. We built a genome-scale transcriptional regulatory network (TRN) of the human placenta using digital genomic footprinting and transcriptomic data. We integrated 475 transcriptomes and 12 DNase hypersensitivity datasets from placental samples to globally and quantitatively map transcription factor (TF)-target gene interactions. In an independent dataset, the TRN model predicted target gene expression with an out-of-sample R2 greater than 0.25 for 73% of target genes. We performed siRNA knockdowns of four TFs and achieved concordance between the predicted gene targets in our TRN and differences in expression of knockdowns with an accuracy of >0.7 for three of the four TFs. Our final model contained 113,158 interactions across 391 TFs and 7712 target genes and is publicly available. We identified 29 TFs which were significantly enriched as regulators for genes previously associated with preterm birth, and eight of these TFs were decreased in preterm placentas.
Collapse
Affiliation(s)
- Alison Paquette
- University of Washington, Seattle, WA, USA
- Seattle Children’s Research Institute, Seattle, WA, USA
| | - Kylia Ahuna
- Oregon Health and Sciences University, Portland, OR, USA
| | | | | | - Hanna Liao
- University of Washington, Seattle, WA, USA
| | | | - Leena Kadam
- Oregon Health and Sciences University, Portland, OR, USA
| | | | - Matthew Bucher
- Oregon Health and Sciences University, Portland, OR, USA
| | - Ryan Roper
- Institute for Systems Biology, Seattle, WA, USA
| | - Cory Funk
- Institute for Systems Biology, Seattle, WA, USA
| | | | | | | | - Heather Brockway
- Department of Physiology and Aging, University of Florida, Gainesville, FL, USA
| | - W. Alex Mason
- University of Tennessee Health Sciences Center, Memphis, TN, USA
| | - Nicole Bush
- University of California San Francisco, San Francisco, CA, USA
| | - Kaja Z. Lewinn
- University of California San Francisco, San Francisco, CA, USA
| | | | | | - Louis J. Muglia
- The Burroughs Wellcome Fund, Research Triangle Park, NC, USA
- Cincinnati Children’s Hospital Medical Center and Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | | | - Yoel Sadovsky
- Magee Womens Research Institute, Pittsburgh, PA, USA
- University of Pittsburgh, Pittsburgh, PA, USA
| | - Leslie Myatt
- Oregon Health and Sciences University, Portland, OR, USA
| | - Sheela Sathyanarayana
- University of Washington, Seattle, WA, USA
- Seattle Children’s Research Institute, Seattle, WA, USA
| | - Nathan D. Price
- Institute for Systems Biology, Seattle, WA, USA
- Thorne HealthTech, New York City, NY, USA
| |
Collapse
|
5
|
Lin X, Xu Y, Wang D, Yang Y, Zhang X, Bie X, Gui L, Chen Z, Ding Y, Mao L, Zhang X, Lu F, Zhang X, Uauy C, Fu X, Xiao J. Systematic identification of wheat spike developmental regulators by integrated multi-omics, transcriptional network, GWAS, and genetic analyses. MOLECULAR PLANT 2024; 17:438-459. [PMID: 38310351 DOI: 10.1016/j.molp.2024.01.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 11/29/2023] [Accepted: 01/30/2024] [Indexed: 02/05/2024]
Abstract
The spike architecture of wheat plays a crucial role in determining grain number, making it a key trait for optimization in wheat breeding programs. In this study, we used a multi-omic approach to analyze the transcriptome and epigenome profiles of the young spike at eight developmental stages, revealing coordinated changes in chromatin accessibility and H3K27me3 abundance during the flowering transition. We constructed a core transcriptional regulatory network (TRN) that drives wheat spike formation and experimentally validated a multi-layer regulatory module involving TaSPL15, TaAGLG1, and TaFUL2. By integrating the TRN with genome-wide association studies, we identified 227 transcription factors, including 42 with known functions and 185 with unknown functions. Further investigation of 61 novel transcription factors using multiple homozygous mutant lines revealed 36 transcription factors that regulate spike architecture or flowering time, such as TaMYC2-A1, TaMYB30-A1, and TaWRKY37-A1. Of particular interest, TaMYB30-A1, downstream of and repressed by WFZP, was found to regulate fertile spikelet number. Notably, the excellent haplotype of TaMYB30-A1, which contains a C allele at the WFZP binding site, was enriched during wheat breeding improvement in China, leading to improved agronomic traits. Finally, we constructed a free and open access Wheat Spike Multi-Omic Database (http://39.98.48.156:8800/#/). Our study identifies novel and high-confidence regulators and offers an effective strategy for dissecting the genetic basis of wheat spike development, with practical value for wheat breeding.
Collapse
Affiliation(s)
- Xuelei Lin
- Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Yongxin Xu
- Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Dongzhi Wang
- Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China.
| | - Yiman Yang
- Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China; Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Xiaoyu Zhang
- Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiaomin Bie
- Key Laboratory of Crop Biology, College of Life Sciences, Shandong Agricultural University, Tai'an, Shandong 271018, China
| | - Lixuan Gui
- Department of Life Science, Tcuni Inc., Chengdu, Sichuan 610000, China
| | - Zhongxu Chen
- Department of Life Science, Tcuni Inc., Chengdu, Sichuan 610000, China
| | - Yiliang Ding
- John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Long Mao
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Xueyong Zhang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Fei Lu
- Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; CAS-JIC Centre of Excellence for Plant and Microbial Science (CEPAMS), Institute of Genetics and Developmental Biology, CAS, Beijing 100101, China
| | - Xiansheng Zhang
- Key Laboratory of Crop Biology, College of Life Sciences, Shandong Agricultural University, Tai'an, Shandong 271018, China
| | - Cristobal Uauy
- John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Xiangdong Fu
- Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jun Xiao
- Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; CAS-JIC Centre of Excellence for Plant and Microbial Science (CEPAMS), Institute of Genetics and Developmental Biology, CAS, Beijing 100101, China.
| |
Collapse
|
6
|
Long T, Bhattacharyya T, Repele A, Naylor M, Nooti S, Krueger S, Manu. The contributions of DNA accessibility and transcription factor occupancy to enhancer activity during cellular differentiation. G3 (BETHESDA, MD.) 2024; 14:jkad269. [PMID: 38124496 PMCID: PMC11090500 DOI: 10.1093/g3journal/jkad269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 11/01/2023] [Indexed: 12/23/2023]
Abstract
During gene regulation, DNA accessibility is thought to limit the availability of transcription factor (TF) binding sites, while TFs can increase DNA accessibility to recruit additional factors that upregulate gene expression. Given this interplay, the causative regulatory events in the modulation of gene expression remain unknown for the vast majority of genes. We utilized deeply sequenced ATAC-Seq data and site-specific knock-in reporter genes to investigate the relationship between the binding-site resolution dynamics of DNA accessibility and the expression dynamics of the enhancers of Cebpa during macrophage-neutrophil differentiation. While the enhancers upregulate reporter expression during the earliest stages of differentiation, there is little corresponding increase in their total accessibility. Conversely, total accessibility peaks during the last stages of differentiation without any increase in enhancer activity. The accessibility of positions neighboring C/EBP-family TF binding sites, which indicates TF occupancy, does increase significantly during early differentiation, showing that the early upregulation of enhancer activity is driven by TF binding. These results imply that a generalized increase in DNA accessibility is not sufficient, and binding by enhancer-specific TFs is necessary, for the upregulation of gene expression. Additionally, high-coverage ATAC-Seq combined with time-series expression data can infer the sequence of regulatory events at binding-site resolution.
Collapse
Affiliation(s)
- Trevor Long
- Department of Biology, University of North Dakota, Grand Forks, ND 58202-9019, USA
| | - Tapas Bhattacharyya
- Department of Biology, University of North Dakota, Grand Forks, ND 58202-9019, USA
| | - Andrea Repele
- Department of Biology, University of North Dakota, Grand Forks, ND 58202-9019, USA
| | - Madison Naylor
- Department of Biology, University of North Dakota, Grand Forks, ND 58202-9019, USA
| | - Sunil Nooti
- Department of Biology, University of North Dakota, Grand Forks, ND 58202-9019, USA
| | - Shawn Krueger
- Department of Biology, University of North Dakota, Grand Forks, ND 58202-9019, USA
| | - Manu
- Department of Biology, University of North Dakota, Grand Forks, ND 58202-9019, USA
| |
Collapse
|
7
|
Long T, Bhattacharyya T, Repele A, Naylor M, Nooti S, Krueger S, Manu. The contributions of DNA accessibility and transcription factor occupancy to enhancer activity during cellular differentiation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.22.529579. [PMID: 37090616 PMCID: PMC10120690 DOI: 10.1101/2023.02.22.529579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
The upregulation of gene expression by enhancers depends upon the interplay between the binding of sequence-specific transcription factors (TFs) and DNA accessibility. DNA accessibility is thought to limit the ability of TFs to bind to their sites, while TFs can increase accessibility to recruit additional factors that upregulate gene expression. Given this interplay, the causative regulatory events underlying the modulation of gene expression during cellular differentiation remain unknown for the vast majority of genes. We investigated the binding-site resolution dynamics of DNA accessibility and the expression dynamics of the enhancers of an important neutrophil gene, Cebpa, during macrophage-neutrophil differentiation. Reporter genes were integrated in a site-specific manner in PUER cells, which are progenitors that can be differentiated into neutrophils or macrophages in vitro by activating the pan-leukocyte TF PU.1. Time series data show that two enhancers upregulate reporter expression during the first 48 hours of neutrophil differentiation. Surprisingly, there is little or no increase in the total accessibility, measured by ATAC-Seq, of the enhancers during the same time period. Conversely, total accessibility peaks 96 hrs after PU.1 activation-consistent with its role as a pioneer-but the enhancers do not upregulate gene expression. Combining deeply sequenced ATAC-Seq data with a new bias-correction method allowed the profiling of accessibility at single-nucleotide resolution and revealed protected regions in the enhancers that match all previously characterized TF binding sites and ChIP-Seq data. Although the accessibility of most positions does not change during early differentiation, that of positions neighboring TF binding sites, an indicator of TF occupancy, did increase significantly. The localized accessibility changes are limited to nucleotides neighboring C/EBP-family TF binding sites, showing that the upregulation of enhancer activity during early differentiation is driven by C/EBP-family TF binding. These results show that increasing the total accessibility of enhancers is not sufficient for upregulating their activity and other events such as TF binding are necessary for upregulation. Also, TF binding can cause upregulation without a perceptible increase in total accessibility. Finally, this study demonstrates the feasibility of comprehensively mapping individual TF binding sites as footprints using high coverage ATAC-Seq and inferring the sequence of events in gene regulation by combining with time-series gene expression data.
Collapse
Affiliation(s)
- Trevor Long
- Department of Biology, University of North Dakota, Grand Forks, 58202-9019 ND, USA
| | - Tapas Bhattacharyya
- Department of Biology, University of North Dakota, Grand Forks, 58202-9019 ND, USA
| | - Andrea Repele
- Department of Biology, University of North Dakota, Grand Forks, 58202-9019 ND, USA
| | - Madison Naylor
- Department of Biology, University of North Dakota, Grand Forks, 58202-9019 ND, USA
| | - Sunil Nooti
- Department of Biology, University of North Dakota, Grand Forks, 58202-9019 ND, USA
| | - Shawn Krueger
- Department of Biology, University of North Dakota, Grand Forks, 58202-9019 ND, USA
| | - Manu
- Department of Biology, University of North Dakota, Grand Forks, 58202-9019 ND, USA
| |
Collapse
|
8
|
Karakaslar EO, Katiyar N, Hasham M, Youn A, Sharma S, Chung C, Marches R, Korstanje R, Banchereau J, Ucar D. Transcriptional activation of Jun and Fos members of the AP-1 complex is a conserved signature of immune aging that contributes to inflammaging. Aging Cell 2023; 22:e13792. [PMID: 36840360 PMCID: PMC10086525 DOI: 10.1111/acel.13792] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 12/20/2022] [Accepted: 01/25/2023] [Indexed: 02/26/2023] Open
Abstract
Diverse mouse strains have different health and life spans, mimicking the diversity among humans. To capture conserved aging signatures, we studied long-lived C57BL/6J and short-lived NZO/HILtJ mouse strains by profiling transcriptomes and epigenomes of immune cells from peripheral blood and the spleen from young and old mice. Transcriptional activation of the AP-1 transcription factor complex, particularly Fos, Junb, and Jun genes, was the most significant and conserved aging signature across tissues and strains. ATAC-seq data analyses showed that the chromatin around these genes was more accessible with age and there were significantly more binding sites for these TFs with age across all studied tissues, targeting pro-inflammatory molecules including Il6. Age-related increases in binding sites of JUN and FOS factors were also conserved in human peripheral blood ATAC-seq data. Single-cell RNA-seq data from the mouse aging cell atlas Tabula Muris Senis showed that the expression of these genes increased with age in B, T, NK cells, and macrophages, with macrophages from old mice expressing these molecules more abundantly than other cells. Functional data showed that upon myeloid cell activation via poly(I:C), the levels of JUN protein and its binding activity increased more significantly in spleen cells from old compared to young mice. In addition, upon activation, old cells produced more IL6 compared to young cells. In sum, we showed that the aging-related transcriptional activation of Jun and Fos family members in AP-1 complex is conserved across immune tissues and long- and short-living mouse strains, possibly contributing to increased inflammation with age.
Collapse
Affiliation(s)
- Emin Onur Karakaslar
- The Jackson Laboratory for Genomic MedicineFarmingtonConnecticutUSA
- Leiden University Medical Center (LUMC)LeidenThe Netherlands
| | - Neerja Katiyar
- The Jackson Laboratory for Genomic MedicineFarmingtonConnecticutUSA
| | - Muneer Hasham
- The Jackson Laboratory for Mammalian GeneticsBar HarborMaineUSA
| | | | | | - Cheng‐han Chung
- The Jackson Laboratory for Genomic MedicineFarmingtonConnecticutUSA
| | - Radu Marches
- The Jackson Laboratory for Genomic MedicineFarmingtonConnecticutUSA
| | - Ron Korstanje
- The Jackson Laboratory for Mammalian GeneticsBar HarborMaineUSA
| | - Jacques Banchereau
- The Jackson Laboratory for Genomic MedicineFarmingtonConnecticutUSA
- ImmunaiNew YorkNew YorkUSA
| | - Duygu Ucar
- The Jackson Laboratory for Genomic MedicineFarmingtonConnecticutUSA
- Department of Genetics and Genome SciencesUniversity of Connecticut Health CenterFarmingtonConnecticutUSA
| |
Collapse
|
9
|
Ding K, Sun S, Luo Y, Long C, Zhai J, Zhai Y, Wang G. PlantCADB: A Comprehensive Plant Chromatin Accessibility Database. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:311-323. [PMID: 36328151 PMCID: PMC10626055 DOI: 10.1016/j.gpb.2022.10.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Revised: 09/25/2022] [Accepted: 10/24/2022] [Indexed: 11/16/2022]
Abstract
Chromatin accessibility landscapes are essential for detecting regulatory elements, illustrating the corresponding regulatory networks, and, ultimately, understanding the molecular basis underlying key biological processes. With the advancement of sequencing technologies, a large volume of chromatin accessibility data has been accumulated and integrated for humans and other mammals. These data have greatly advanced the study of disease pathogenesis, cancer survival prognosis, and tissue development. To advance the understanding of molecular mechanisms regulating plant key traits and biological processes, we developed a comprehensive plant chromatin accessibility database (PlantCADB) from 649 samples of 37 species. These samples are abiotic stress-related (such as heat, cold, drought, and salt; 159 samples), development-related (232 samples), and/or tissue-specific (376 samples). Overall, 18,339,426 accessible chromatin regions (ACRs) were compiled. These ACRs were annotated with genomic information, associated genes, transcription factor footprint, motif, and single-nucleotide polymorphisms (SNPs). Additionally, PlantCADB provides various tools to visualize ACRs and corresponding annotations. It thus forms an integrated, annotated, and analyzed plant-related chromatin accessibility resource, which can aid in better understanding genetic regulatory networks underlying development, important traits, stress adaptations, and evolution.PlantCADB is freely available at https://bioinfor.nefu.edu.cn/PlantCADB/.
Collapse
Affiliation(s)
- Ke Ding
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin 150040, China; College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Shanwen Sun
- College of Life Science, Northeast Forestry University, Harbin 150040, China
| | - Yang Luo
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Chaoyue Long
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Jingwen Zhai
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Yixiao Zhai
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Guohua Wang
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin 150040, China; College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China.
| |
Collapse
|
10
|
Whalen S, Inoue F, Ryu H, Fair T, Markenscoff-Papadimitriou E, Keough K, Kircher M, Martin B, Alvarado B, Elor O, Laboy Cintron D, Williams A, Hassan Samee MA, Thomas S, Krencik R, Ullian EM, Kriegstein A, Rubenstein JL, Shendure J, Pollen AA, Ahituv N, Pollard KS. Machine learning dissection of human accelerated regions in primate neurodevelopment. Neuron 2023; 111:857-873.e8. [PMID: 36640767 PMCID: PMC10023452 DOI: 10.1016/j.neuron.2022.12.026] [Citation(s) in RCA: 55] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 09/29/2022] [Accepted: 12/18/2022] [Indexed: 01/15/2023]
Abstract
Using machine learning (ML), we interrogated the function of all human-chimpanzee variants in 2,645 human accelerated regions (HARs), finding 43% of HARs have variants with large opposing effects on chromatin state and 14% on neurodevelopmental enhancer activity. This pattern, consistent with compensatory evolution, was confirmed using massively parallel reporter assays in chimpanzee and human neural progenitor cells. The species-specific enhancer activity of HARs was accurately predicted from the presence and absence of transcription factor footprints in each species. Despite these striking cis effects, activity of a given HAR sequence was nearly identical in human and chimpanzee cells. This suggests that HARs did not evolve to compensate for changes in the trans environment but instead altered their ability to bind factors present in both species. Thus, ML prioritized variants with functional effects on human neurodevelopment and revealed an unexpected reason why HARs may have evolved so rapidly.
Collapse
Affiliation(s)
- Sean Whalen
- Gladstone Institutes, San Francisco, CA 94158, USA
| | - Fumitaka Inoue
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Hane Ryu
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA; Pharmaceutical Sciences and Pharmacogenomics Graduate Program, University of California, San Francisco, San Francisco, CA, USA
| | - Tyler Fair
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA 94143, USA; Department of Neurology, University of California, San Francisco, San Francisco, CA 94158, USA
| | | | - Kathleen Keough
- Gladstone Institutes, San Francisco, CA 94158, USA; Pharmaceutical Sciences and Pharmacogenomics Graduate Program, University of California, San Francisco, San Francisco, CA, USA
| | - Martin Kircher
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Berlin Institute of Health at Charité - Universitätsmedizin Berlin, 10117 Berlin, Germany; Institute of Human Genetics, University Medical Center Schleswig-Holstein, University of Lübeck, 23562 Lübeck, Germany
| | - Beth Martin
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Beatriz Alvarado
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Orry Elor
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Dianne Laboy Cintron
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | | | | | - Sean Thomas
- Gladstone Institutes, San Francisco, CA 94158, USA
| | - Robert Krencik
- Department of Neurosurgery, Center for Neuroregeneration, Houston Methodist Research Institute, Houston, TX, USA
| | - Erik M Ullian
- Departments of Ophthalmology and Physiology, University of California, San Francisco, San Francisco, CA, USA; Kavli Institute for Fundamental Neuroscience, University of California, San Francisco, San Francisco, CA, USA
| | - Arnold Kriegstein
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA 94143, USA; Department of Neurology, University of California, San Francisco, San Francisco, CA 94158, USA
| | - John L Rubenstein
- Department of Psychiatry, University of California, San Francisco, San Francisco, CA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Howard Hughes Medical Institute, Seattle, WA 98195, USA; Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA
| | - Alex A Pollen
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA; Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA 94143, USA; Department of Neurology, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.
| | - Katherine S Pollard
- Gladstone Institutes, San Francisco, CA 94158, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA; Department of Epidemiology and Biostatistics and Institute for Computational Health Sciences, University of California, San Francisco, San Francisco, CA, USA; Chan-Zuckerberg Biohub, San Francisco, CA, USA.
| |
Collapse
|
11
|
Li Z, Kuo CC, Ticconi F, Shaigan M, Gehrmann J, Gusmao EG, Allhoff M, Manolov M, Zenke M, Costa IG. RGT: a toolbox for the integrative analysis of high throughput regulatory genomics data. BMC Bioinformatics 2023; 24:79. [PMID: 36879236 PMCID: PMC9990262 DOI: 10.1186/s12859-023-05184-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 02/13/2023] [Indexed: 03/08/2023] Open
Abstract
BACKGROUND Massive amounts of data are produced by combining next-generation sequencing with complex biochemistry techniques to characterize regulatory genomics profiles, such as protein-DNA interaction and chromatin accessibility. Interpretation of such high-throughput data typically requires different computation methods. However, existing tools are usually developed for a specific task, which makes it challenging to analyze the data in an integrative manner. RESULTS We here describe the Regulatory Genomics Toolbox (RGT), a computational library for the integrative analysis of regulatory genomics data. RGT provides different functionalities to handle genomic signals and regions. Based on that, we developed several tools to perform distinct downstream analyses, including the prediction of transcription factor binding sites using ATAC-seq data, identification of differential peaks from ChIP-seq data, and detection of triple helix mediated RNA and DNA interactions, visualization, and finding an association between distinct regulatory factors. CONCLUSION We present here RGT; a framework to facilitate the customization of computational methods to analyze genomic data for specific regulatory genomics problems. RGT is a comprehensive and flexible Python package for analyzing high throughput regulatory genomics data and is available at: https://github.com/CostaLab/reg-gen . The documentation is available at: https://reg-gen.readthedocs.io.
Collapse
Affiliation(s)
- Zhijian Li
- Institute for Computational Genomics, Medical Faculty, RWTH Aachen University, 52074, Aachen, Germany.
- Joint Research Center for Computational Biomedicine, RWTH Aachen University Hospital, 52074, Aachen, Germany.
| | - Chao-Chung Kuo
- Institute for Computational Genomics, Medical Faculty, RWTH Aachen University, 52074, Aachen, Germany
- Joint Research Center for Computational Biomedicine, RWTH Aachen University Hospital, 52074, Aachen, Germany
| | - Fabio Ticconi
- Institute for Computational Genomics, Medical Faculty, RWTH Aachen University, 52074, Aachen, Germany
- Joint Research Center for Computational Biomedicine, RWTH Aachen University Hospital, 52074, Aachen, Germany
| | - Mina Shaigan
- Institute for Computational Genomics, Medical Faculty, RWTH Aachen University, 52074, Aachen, Germany
- Joint Research Center for Computational Biomedicine, RWTH Aachen University Hospital, 52074, Aachen, Germany
| | - Julia Gehrmann
- Institute for Computational Genomics, Medical Faculty, RWTH Aachen University, 52074, Aachen, Germany
- Joint Research Center for Computational Biomedicine, RWTH Aachen University Hospital, 52074, Aachen, Germany
| | - Eduardo Gade Gusmao
- Institute for Computational Genomics, Medical Faculty, RWTH Aachen University, 52074, Aachen, Germany
- Joint Research Center for Computational Biomedicine, RWTH Aachen University Hospital, 52074, Aachen, Germany
| | - Manuel Allhoff
- Institute for Computational Genomics, Medical Faculty, RWTH Aachen University, 52074, Aachen, Germany
- Joint Research Center for Computational Biomedicine, RWTH Aachen University Hospital, 52074, Aachen, Germany
| | - Martin Manolov
- Institute for Computational Genomics, Medical Faculty, RWTH Aachen University, 52074, Aachen, Germany
- Joint Research Center for Computational Biomedicine, RWTH Aachen University Hospital, 52074, Aachen, Germany
| | - Martin Zenke
- Department of Cell Biology, Institute of Biomedical Engineering, RWTH Aachen University Medical School, 52074, Aachen, Germany
- Helmholtz Institute for Biomedical Engineering, RWTH Aachen University, 52074, Aachen, Germany
- Department of Hematology, Oncology, Hemostaseology, and Stem Cell Transplantation, Faculty of Medicine, RWTH Aachen University, 52074, Aachen, Germany
| | - Ivan G Costa
- Institute for Computational Genomics, Medical Faculty, RWTH Aachen University, 52074, Aachen, Germany.
- Joint Research Center for Computational Biomedicine, RWTH Aachen University Hospital, 52074, Aachen, Germany.
| |
Collapse
|
12
|
Zhao L, Yang Y, Chen J, Lin X, Zhang H, Wang H, Wang H, Bie X, Jiang J, Feng X, Fu X, Zhang X, Du Z, Xiao J. Dynamic chromatin regulatory programs during embryogenesis of hexaploid wheat. Genome Biol 2023; 24:7. [PMID: 36639687 PMCID: PMC9837924 DOI: 10.1186/s13059-022-02844-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Accepted: 12/31/2022] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Plant and animal embryogenesis have conserved and distinct features. Cell fate transitions occur during embryogenesis in both plants and animals. The epigenomic processes regulating plant embryogenesis remain largely elusive. RESULTS Here, we elucidate chromatin and transcriptomic dynamics during embryogenesis of the most cultivated crop, hexaploid wheat. Time-series analysis reveals stage-specific and proximal-distal distinct chromatin accessibility and dynamics concordant with transcriptome changes. Following fertilization, the remodeling kinetics of H3K4me3, H3K27ac, and H3K27me3 differ from that in mammals, highlighting considerable species-specific epigenomic dynamics during zygotic genome activation. Polycomb repressive complex 2 (PRC2)-mediated H3K27me3 deposition is important for embryo establishment. Later H3K27ac, H3K27me3, and chromatin accessibility undergo dramatic remodeling to establish a permissive chromatin environment facilitating the access of transcription factors to cis-elements for fate patterning. Embryonic maturation is characterized by increasing H3K27me3 and decreasing chromatin accessibility, which likely participates in restricting totipotency while preventing extensive organogenesis. Finally, epigenomic signatures are correlated with biased expression among homeolog triads and divergent expression after polyploidization, revealing an epigenomic contributor to subgenome diversification in an allohexaploid genome. CONCLUSIONS Collectively, we present an invaluable resource for comparative and mechanistic analysis of the epigenomic regulation of crop embryogenesis.
Collapse
Affiliation(s)
- Long Zhao
- Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China. .,University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Yiman Yang
- Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.,Nanjing Agricultural University, Nanjing, Jiangsu, China
| | - Jinchao Chen
- Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xuelei Lin
- Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Hao Zhang
- Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Hao Wang
- Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Hongzhe Wang
- Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Xiaomin Bie
- Shandong Agricultural University, Tai'an, Shandong, China
| | - Jiafu Jiang
- Nanjing Agricultural University, Nanjing, Jiangsu, China
| | - Xiaoqi Feng
- John Innes Centre, Colney Lane, Norwich, NR4 7UH, UK
| | - Xiangdong Fu
- Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | | | - Zhuo Du
- University of Chinese Academy of Sciences, Beijing, 100049, China.,State Key Laboratory of Molecular Developmental Biology, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Jun Xiao
- Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China. .,University of Chinese Academy of Sciences, Beijing, 100049, China. .,CAS-JIC Centre of Excellence for Plant and Microbial Science (CEPAMS), Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.
| |
Collapse
|
13
|
Hoffmann M, Trummer N, Schwartz L, Jankowski J, Lee HK, Willruth LL, Lazareva O, Yuan K, Baumgarten N, Schmidt F, Baumbach J, Schulz MH, Blumenthal DB, Hennighausen L, List M. TF-Prioritizer: a Java pipeline to prioritize condition-specific transcription factors. Gigascience 2022; 12:giad026. [PMID: 37132521 PMCID: PMC10155229 DOI: 10.1093/gigascience/giad026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 02/23/2023] [Accepted: 04/05/2023] [Indexed: 05/04/2023] Open
Abstract
BACKGROUND Eukaryotic gene expression is controlled by cis-regulatory elements (CREs), including promoters and enhancers, which are bound by transcription factors (TFs). Differential expression of TFs and their binding affinity at putative CREs determine tissue- and developmental-specific transcriptional activity. Consolidating genomic datasets can offer further insights into the accessibility of CREs, TF activity, and, thus, gene regulation. However, the integration and analysis of multimodal datasets are hampered by considerable technical challenges. While methods for highlighting differential TF activity from combined chromatin state data (e.g., chromatin immunoprecipitation [ChIP], ATAC, or DNase sequencing) and RNA sequencing data exist, they do not offer convenient usability, have limited support for large-scale data processing, and provide only minimal functionality for visually interpreting results. RESULTS We developed TF-Prioritizer, an automated pipeline that prioritizes condition-specific TFs from multimodal data and generates an interactive web report. We demonstrated its potential by identifying known TFs along with their target genes, as well as previously unreported TFs active in lactating mouse mammary glands. Additionally, we studied a variety of ENCODE datasets for cell lines K562 and MCF-7, including 12 histone modification ChIP sequencing as well as ATAC and DNase sequencing datasets, where we observe and discuss assay-specific differences. CONCLUSION TF-Prioritizer accepts ATAC, DNase, or ChIP sequencing and RNA sequencing data as input and identifies TFs with differential activity, thus offering an understanding of genome-wide gene regulation, potential pathogenesis, and therapeutic targets in biomedical research.
Collapse
Affiliation(s)
- Markus Hoffmann
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising D-85354, Germany
- Institute for Advanced Study, Technical University of Munich, Garching D-85748, Germany
- National Institute of Diabetes, Digestive, and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Nico Trummer
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising D-85354,Germany
| | - Leon Schwartz
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising D-85354,Germany
| | - Jakub Jankowski
- National Institute of Diabetes, Digestive, and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Hye Kyung Lee
- National Institute of Diabetes, Digestive, and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Lina-Liv Willruth
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising D-85354,Germany
| | - Olga Lazareva
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
- Junior Clinical Cooperation Unit, Multiparametric Methods for Early Detection of Prostate Cancer, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany
| | - Kevin Yuan
- Big Data Institute, Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK
| | - Nina Baumgarten
- Institute of Cardiovascular Regeneration, Goethe University, 60590 Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, 60590 Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590 Frankfurt am Main, Germany
| | - Florian Schmidt
- Laboratory of Systems Biology and Data Analytics, Genome Institute of Singapore, 60 Biopolis Street, Singapore
138672, Singapore
| | - Jan Baumbach
- Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Computational BioMedicine Lab, University of Southern Denmark, Odense, Denmark
| | - Marcel H Schulz
- Institute of Cardiovascular Regeneration, Goethe University, 60590 Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, 60590 Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590 Frankfurt am Main, Germany
| | - David B Blumenthal
- Biomedical Network Science Lab, Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Lothar Hennighausen
- Institute for Advanced Study, Technical University of Munich, Garching D-85748, Germany
- National Institute of Diabetes, Digestive, and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Markus List
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising D-85354,Germany
| |
Collapse
|
14
|
Gu Y, Zhou Y, Ju S, Liu X, Zhang Z, Guo J, Gao J, Zang J, Sun H, Chen Q, Wang J, Xu J, Xu Y, Chen Y, Guo Y, Dai J, Ma H, Wang C, Jin G, Li C, Xia Y, Shen H, Yang Y, Guo X, Hu Z. Multi-omics profiling visualizes dynamics of cardiac development and functions. Cell Rep 2022; 41:111891. [PMID: 36577384 DOI: 10.1016/j.celrep.2022.111891] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 09/14/2022] [Accepted: 12/05/2022] [Indexed: 12/29/2022] Open
Abstract
Cardiogenesis is a tightly regulated dynamic process through a continuum of differentiation and proliferation events. Key factors and pathways governing this process remain incompletely understood. Here, we investigate mice hearts from embryonic day 10.5 to postnatal week 8 and dissect developmental changes in phosphoproteome-, proteome-, metabolome-, and transcriptome-encompassing cardiogenesis and cardiac maturation. We identify mitogen-activated protein kinases as core kinases involved in transcriptional regulation by mediating the phosphorylation of chromatin remodeling proteins during early cardiogenesis. We construct the reciprocal regulatory network of transcription factors (TFs) and identify a series of TFs controlling early cardiogenesis involved in cycling-dependent proliferation. After birth, we identify cardiac resident macrophages with high arachidonic acid metabolism activities likely involved in the clearance of injured apoptotic cardiomyocytes. Together, our comprehensive multi-omics data offer a panoramic view of cardiac development and maturation that provides a resource for further in-depth functional exploration.
Collapse
Affiliation(s)
- Yayun Gu
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211100, China; School of Public Health, Center for Global Health, Nanjing Medical University, Nanjing, Jiangsu 211100, China
| | - Yan Zhou
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211100, China; School of Public Health, Center for Global Health, Nanjing Medical University, Nanjing, Jiangsu 211100, China
| | - Sihan Ju
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211100, China; Department of Neurosurgery, Huashan Hospital, Fudan University, Shanghai 200040, China
| | - Xiaofei Liu
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211100, China
| | - Zicheng Zhang
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211100, China
| | - Jia Guo
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211100, China
| | - Jimiao Gao
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211100, China; School of Public Health, Center for Global Health, Nanjing Medical University, Nanjing, Jiangsu 211100, China
| | - Jie Zang
- School of Public Health, Center for Global Health, Nanjing Medical University, Nanjing, Jiangsu 211100, China
| | - Hao Sun
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211100, China
| | - Qi Chen
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211100, China; School of Public Health, Center for Global Health, Nanjing Medical University, Nanjing, Jiangsu 211100, China
| | - Jinghan Wang
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211100, China; School of Public Health, Center for Global Health, Nanjing Medical University, Nanjing, Jiangsu 211100, China
| | - Jiani Xu
- School of Public Health, Center for Global Health, Nanjing Medical University, Nanjing, Jiangsu 211100, China
| | - Yiqun Xu
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211100, China; School of Public Health, Center for Global Health, Nanjing Medical University, Nanjing, Jiangsu 211100, China
| | - Yingjia Chen
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211100, China
| | - Yueshuai Guo
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211100, China
| | - Juncheng Dai
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211100, China; School of Public Health, Center for Global Health, Nanjing Medical University, Nanjing, Jiangsu 211100, China
| | - Hongxia Ma
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211100, China; School of Public Health, Center for Global Health, Nanjing Medical University, Nanjing, Jiangsu 211100, China
| | - Cheng Wang
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211100, China
| | - Guangfu Jin
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211100, China; School of Public Health, Center for Global Health, Nanjing Medical University, Nanjing, Jiangsu 211100, China
| | - Chaojun Li
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211100, China
| | - Yankai Xia
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211100, China; School of Public Health, Center for Global Health, Nanjing Medical University, Nanjing, Jiangsu 211100, China
| | - Hongbing Shen
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211100, China; School of Public Health, Center for Global Health, Nanjing Medical University, Nanjing, Jiangsu 211100, China
| | - Yang Yang
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211100, China
| | - Xuejiang Guo
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211100, China
| | - Zhibin Hu
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211100, China; School of Public Health, Center for Global Health, Nanjing Medical University, Nanjing, Jiangsu 211100, China.
| |
Collapse
|
15
|
Rivière Q, Corso M, Ciortan M, Noël G, Verbruggen N, Defrance M. Exploiting Genomic Features to Improve the Prediction of Transcription Factor-Binding Sites in Plants. PLANT & CELL PHYSIOLOGY 2022; 63:1457-1473. [PMID: 35799371 DOI: 10.1093/pcp/pcac095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2021] [Revised: 06/07/2022] [Accepted: 07/06/2022] [Indexed: 06/15/2023]
Abstract
The identification of transcription factor (TF) target genes is central in biology. A popular approach is based on the location by pattern matching of potential cis-regulatory elements (CREs). During the last few years, tools integrating next-generation sequencing data have been developed to improve the performance of pattern matching. However, such tools have not yet been comprehensively evaluated in plants. Hence, we developed a new streamlined method aiming at predicting CREs and target genes of plant TFs in specific organs or conditions. Our approach implements a supervised machine learning strategy, which allows decision rule models to be learnt using TF ChIP-chip/seq experimental data. Different layers of genomic features were integrated in predictive models: the position on the gene, the DNA sequence conservation, the chromatin state and various CRE footprints. Among the tested features, the chromatin features were crucial for improving the accuracy of the method. Furthermore, we evaluated the transferability of predictive models across TFs, organs and species. Finally, we validated our method by correctly inferring the target genes of key TFs controlling metabolite biosynthesis at the organ level in Arabidopsis. We developed a tool-Wimtrap-to reproduce our approach in plant species and conditions/organs for which ChIP-chip/seq data are available. Wimtrap is a user-friendly R package that supports an R Shiny web interface and is provided with pre-built models that can be used to quickly get predictions of CREs and TF gene targets in different organs or conditions in Arabidopsis thaliana, Solanum lycopersicum, Oryza sativa and Zea mays.
Collapse
Affiliation(s)
- Quentin Rivière
- Brussels Bioengineering School, Laboratory of Plant Physiology and molecular Genetics, Université Libre de Bruxelles, Brussels 1050, Belgium
| | - Massimiliano Corso
- Brussels Bioengineering School, Laboratory of Plant Physiology and molecular Genetics, Université Libre de Bruxelles, Brussels 1050, Belgium
- INRAE, AgroParisTech, Institut Jean-Pierre Bourgin (IJPB), Université Paris-Saclay, Versailles 78000, France
| | - Madalina Ciortan
- Interuniversity Institute of Bioinformatics in Brussels, Machine Learning Group, Université Libre de Bruxelles, Brussels 1050, Belgium
| | - Grégoire Noël
- Functional and Evolutionary Entomology, Gembloux Agro-Bio Tech, University of Liège, Passage des Déportés 2, Gembloux 5030, Belgium
| | - Nathalie Verbruggen
- Brussels Bioengineering School, Laboratory of Plant Physiology and molecular Genetics, Université Libre de Bruxelles, Brussels 1050, Belgium
| | - Matthieu Defrance
- Interuniversity Institute of Bioinformatics in Brussels, Machine Learning Group, Université Libre de Bruxelles, Brussels 1050, Belgium
| |
Collapse
|
16
|
Towards a better understanding of TF-DNA binding prediction from genomic features. Comput Biol Med 2022; 149:105993. [DOI: 10.1016/j.compbiomed.2022.105993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 07/12/2022] [Accepted: 08/14/2022] [Indexed: 11/17/2022]
|
17
|
Yashar WM, Kong G, VanCampen J, Curtiss BM, Coleman DJ, Carbone L, Yardimci GG, Maxson JE, Braun TP. GoPeaks: histone modification peak calling for CUT&Tag. Genome Biol 2022; 23:144. [PMID: 35788238 PMCID: PMC9252088 DOI: 10.1186/s13059-022-02707-w] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Accepted: 06/15/2022] [Indexed: 12/27/2022] Open
Abstract
Genome-wide mapping of histone modifications is critical to understanding transcriptional regulation. CUT&Tag is a new method for profiling histone modifications, offering improved sensitivity and decreased cost compared with ChIP-seq. Here, we present GoPeaks, a peak calling method specifically designed for histone modification CUT&Tag data. We compare the performance of GoPeaks against commonly used peak calling algorithms to detect histone modifications that display a range of peak profiles and are frequently used in epigenetic studies. We find that GoPeaks robustly detects genome-wide histone modifications and, notably, identifies a substantial number of H3K27ac peaks with improved sensitivity compared to other standard algorithms.
Collapse
Affiliation(s)
- William M. Yashar
- Knight Cancer Institute, Oregon Health & Science University, Portland, USA
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, USA
| | - Garth Kong
- Knight Cancer Institute, Oregon Health & Science University, Portland, USA
| | - Jake VanCampen
- Knight Cancer Institute, Oregon Health & Science University, Portland, USA
| | | | - Daniel J. Coleman
- Knight Cancer Institute, Oregon Health & Science University, Portland, USA
| | - Lucia Carbone
- Knight Cardiovascular Institute, Oregon Health & Science University, Portland, USA
| | - Galip Gürkan Yardimci
- Knight Cancer Institute, Oregon Health & Science University, Portland, USA
- Center for Early Cancer Detection, Oregon Health & Science University, Portland, USA
| | - Julia E. Maxson
- Knight Cancer Institute, Oregon Health & Science University, Portland, USA
- Division of Oncologic Sciences, Oregon Health & Science University, Portland, USA
| | - Theodore P. Braun
- Knight Cancer Institute, Oregon Health & Science University, Portland, USA
- Division of Oncologic Sciences, Oregon Health & Science University, Portland, USA
- Division of Hematology & Medical Oncology, Oregon Health & Science University, Portland, USA
| |
Collapse
|
18
|
Hesami M, Alizadeh M, Jones AMP, Torkamaneh D. Machine learning: its challenges and opportunities in plant system biology. Appl Microbiol Biotechnol 2022; 106:3507-3530. [PMID: 35575915 DOI: 10.1007/s00253-022-11963-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 03/14/2022] [Accepted: 05/07/2022] [Indexed: 12/25/2022]
Abstract
Sequencing technologies are evolving at a rapid pace, enabling the generation of massive amounts of data in multiple dimensions (e.g., genomics, epigenomics, transcriptomic, metabolomics, proteomics, and single-cell omics) in plants. To provide comprehensive insights into the complexity of plant biological systems, it is important to integrate different omics datasets. Although recent advances in computational analytical pipelines have enabled efficient and high-quality exploration and exploitation of single omics data, the integration of multidimensional, heterogenous, and large datasets (i.e., multi-omics) remains a challenge. In this regard, machine learning (ML) offers promising approaches to integrate large datasets and to recognize fine-grained patterns and relationships. Nevertheless, they require rigorous optimizations to process multi-omics-derived datasets. In this review, we discuss the main concepts of machine learning as well as the key challenges and solutions related to the big data derived from plant system biology. We also provide in-depth insight into the principles of data integration using ML, as well as challenges and opportunities in different contexts including multi-omics, single-cell omics, protein function, and protein-protein interaction. KEY POINTS: • The key challenges and solutions related to the big data derived from plant system biology have been highlighted. • Different methods of data integration have been discussed. • Challenges and opportunities of the application of machine learning in plant system biology have been highlighted and discussed.
Collapse
Affiliation(s)
- Mohsen Hesami
- Department of Plant Agriculture, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | - Milad Alizadeh
- Department of Botany, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | | | - Davoud Torkamaneh
- Département de Phytologie, Université Laval, Québec City, QC, G1V 0A6, Canada. .,Institut de Biologie Intégrative Et Des Systèmes (IBIS), Université Laval, Québec City, QC, G1V 0A6, Canada.
| |
Collapse
|
19
|
de Medeiros Oliveira M, Bonadio I, Lie de Melo A, Mendes Souza G, Durham AM. TSSFinder-fast and accurate ab initio prediction of the core promoter in eukaryotic genomes. Brief Bioinform 2021; 22:bbab198. [PMID: 34050351 PMCID: PMC8574697 DOI: 10.1093/bib/bbab198] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 02/14/2021] [Accepted: 02/23/2021] [Indexed: 12/02/2022] Open
Abstract
Promoter annotation is an important task in the analysis of a genome. One of the main challenges for this task is locating the border between the promoter region and the transcribing region of the gene, the transcription start site (TSS). The TSS is the reference point to delimit the DNA sequence responsible for the assembly of the transcribing complex. As the same gene can have more than one TSS, so to delimit the promoter region, it is important to locate the closest TSS to the site of the beginning of the translation. This paper presents TSSFinder, a new software for the prediction of the TSS signal of eukaryotic genes that is significantly more accurate than other available software. We currently are the only application to offer pre-trained models for six different eukaryotic organisms: Arabidopsis thaliana, Drosophila melanogaster, Gallus gallus, Homo sapiens, Oryza sativa and Saccharomyces cerevisiae. Additionally, our software can be easily customized for specific organisms using only 125 DNA sequences with a validated TSS signal and corresponding genomic locations as a training set. TSSFinder is a valuable new tool for the annotation of genomes. TSSFinder source code and docker container can be downloaded from http://tssfinder.github.io. Alternatively, TSSFinder is also available as a web service at http://sucest-fun.org/wsapp/tssfinder/.
Collapse
Affiliation(s)
| | - Igor Bonadio
- Data Science, Elo7 Research Lab, São Paulo, Brazil
| | | | | | | |
Collapse
|
20
|
Caudai C, Galizia A, Geraci F, Le Pera L, Morea V, Salerno E, Via A, Colombo T. AI applications in functional genomics. Comput Struct Biotechnol J 2021; 19:5762-5790. [PMID: 34765093 PMCID: PMC8566780 DOI: 10.1016/j.csbj.2021.10.009] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 10/05/2021] [Accepted: 10/05/2021] [Indexed: 12/13/2022] Open
Abstract
We review the current applications of artificial intelligence (AI) in functional genomics. The recent explosion of AI follows the remarkable achievements made possible by "deep learning", along with a burst of "big data" that can meet its hunger. Biology is about to overthrow astronomy as the paradigmatic representative of big data producer. This has been made possible by huge advancements in the field of high throughput technologies, applied to determine how the individual components of a biological system work together to accomplish different processes. The disciplines contributing to this bulk of data are collectively known as functional genomics. They consist in studies of: i) the information contained in the DNA (genomics); ii) the modifications that DNA can reversibly undergo (epigenomics); iii) the RNA transcripts originated by a genome (transcriptomics); iv) the ensemble of chemical modifications decorating different types of RNA transcripts (epitranscriptomics); v) the products of protein-coding transcripts (proteomics); and vi) the small molecules produced from cell metabolism (metabolomics) present in an organism or system at a given time, in physiological or pathological conditions. After reviewing main applications of AI in functional genomics, we discuss important accompanying issues, including ethical, legal and economic issues and the importance of explainability.
Collapse
Affiliation(s)
- Claudia Caudai
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Antonella Galizia
- CNR, Institute of Applied Mathematics and Information Technologies (IMATI), Genoa, Italy
| | - Filippo Geraci
- CNR, Institute for Informatics and Telematics (IIT), Pisa, Italy
| | - Loredana Le Pera
- CNR, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Bari, Italy
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Veronica Morea
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Emanuele Salerno
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Allegra Via
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Teresa Colombo
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| |
Collapse
|
21
|
Sofiadis K, Josipovic N, Nikolic M, Kargapolova Y, Übelmesser N, Varamogianni‐Mamatsi V, Zirkel A, Papadionysiou I, Loughran G, Keane J, Michel A, Gusmao EG, Becker C, Altmüller J, Georgomanolis T, Mizi A, Papantonis A. HMGB1 coordinates SASP-related chromatin folding and RNA homeostasis on the path to senescence. Mol Syst Biol 2021; 17:e9760. [PMID: 34166567 PMCID: PMC8224457 DOI: 10.15252/msb.20209760] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Revised: 05/17/2021] [Accepted: 05/19/2021] [Indexed: 12/15/2022] Open
Abstract
Spatial organization and gene expression of mammalian chromosomes are maintained and regulated in conjunction with cell cycle progression. This is perturbed once cells enter senescence and the highly abundant HMGB1 protein is depleted from nuclei to act as an extracellular proinflammatory stimulus. Despite its physiological importance, we know little about the positioning of HMGB1 on chromatin and its nuclear roles. To address this, we mapped HMGB1 binding genome-wide in two primary cell lines. We integrated ChIP-seq and Hi-C with graph theory to uncover clustering of HMGB1-marked topological domains that harbor genes involved in paracrine senescence. Using simplified Cross-Linking and Immuno-Precipitation and functional tests, we show that HMGB1 is also a bona fide RNA-binding protein (RBP) binding hundreds of mRNAs. It presents an interactome rich in RBPs implicated in senescence regulation. The mRNAs of many of these RBPs are directly bound by HMGB1 and regulate availability of SASP-relevant transcripts. Our findings reveal a broader than hitherto assumed role for HMGB1 in coordinating chromatin folding and RNA homeostasis as part of a regulatory loop controlling cell-autonomous and paracrine senescence.
Collapse
Affiliation(s)
| | - Natasa Josipovic
- Institute of PathologyUniversity Medical Center GöttingenGöttingenGermany
| | - Milos Nikolic
- Center for Molecular Medicine CologneUniversity of CologneCologneGermany
| | - Yulia Kargapolova
- Center for Molecular Medicine CologneUniversity of CologneCologneGermany
- Present address:
Heart CenterUniversity Hospital CologneCologneGermany
| | - Nadine Übelmesser
- Institute of PathologyUniversity Medical Center GöttingenGöttingenGermany
| | | | - Anne Zirkel
- Center for Molecular Medicine CologneUniversity of CologneCologneGermany
| | | | | | - James Keane
- RibomapsCorkIreland
- Cork Institute of TechnologyCorkIreland
| | | | - Eduardo G Gusmao
- Institute of PathologyUniversity Medical Center GöttingenGöttingenGermany
| | | | | | - Theodore Georgomanolis
- Center for Molecular Medicine CologneUniversity of CologneCologneGermany
- Cologne Center for GenomicsUniversity of CologneCologneGermany
| | - Athanasia Mizi
- Institute of PathologyUniversity Medical Center GöttingenGöttingenGermany
| | - Argyris Papantonis
- Institute of PathologyUniversity Medical Center GöttingenGöttingenGermany
- Center for Molecular Medicine CologneUniversity of CologneCologneGermany
| |
Collapse
|
22
|
Minnoye L, Marinov GK, Krausgruber T, Pan L, Marand AP, Secchia S, Greenleaf WJ, Furlong EEM, Zhao K, Schmitz RJ, Bock C, Aerts S. Chromatin accessibility profiling methods. NATURE REVIEWS. METHODS PRIMERS 2021; 1:10. [PMID: 38410680 PMCID: PMC10895463 DOI: 10.1038/s43586-020-00008-9] [Citation(s) in RCA: 89] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 12/01/2020] [Indexed: 02/06/2023]
Abstract
Chromatin accessibility, or the physical access to chromatinized DNA, is a widely studied characteristic of the eukaryotic genome. As active regulatory DNA elements are generally 'accessible', the genome-wide profiling of chromatin accessibility can be used to identify candidate regulatory genomic regions in a tissue or cell type. Multiple biochemical methods have been developed to profile chromatin accessibility, both in bulk and at the single-cell level. Depending on the method, enzymatic cleavage, transposition or DNA methyltransferases are used, followed by high-throughput sequencing, providing a view of genome-wide chromatin accessibility. In this Primer, we discuss these biochemical methods, as well as bioinformatics tools for analysing and interpreting the generated data, and insights into the key regulators underlying developmental, evolutionary and disease processes. We outline standards for data quality, reproducibility and deposition used by the genomics community. Although chromatin accessibility profiling is invaluable to study gene regulation, alone it provides only a partial view of this complex process. Orthogonal assays facilitate the interpretation of accessible regions with respect to enhancer-promoter proximity, functional transcription factor binding and regulatory function. We envision that technological improvements including single-molecule, multi-omics and spatial methods will bring further insight into the secrets of genome regulation.
Collapse
Affiliation(s)
- Liesbeth Minnoye
- Center for Brain & Disease Research, VIB-KU Leuven, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | | | - Thomas Krausgruber
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Lixia Pan
- Laboratory of Epigenome Biology, Systems Biology Center, Division of Intramural Research, National Heart, Lung and Blood Institute, NIH, Bethesda, MD, USA
| | | | - Stefano Secchia
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | | | - Eileen E M Furlong
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Keji Zhao
- Laboratory of Epigenome Biology, Systems Biology Center, Division of Intramural Research, National Heart, Lung and Blood Institute, NIH, Bethesda, MD, USA
| | | | - Christoph Bock
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
- Institute of Artificial Intelligence and Decision Support, Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Vienna, Austria
| | - Stein Aerts
- Center for Brain & Disease Research, VIB-KU Leuven, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| |
Collapse
|
23
|
Wang F, Bai X, Wang Y, Jiang Y, Ai B, Zhang Y, Liu Y, Xu M, Wang Q, Han X, Pan Q, Li Y, Li X, Zhang J, Zhao J, Zhang G, Feng C, Zhu J, Li C. ATACdb: a comprehensive human chromatin accessibility database. Nucleic Acids Res 2021; 49:D55-D64. [PMID: 33125076 PMCID: PMC7779059 DOI: 10.1093/nar/gkaa943] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Revised: 10/05/2020] [Accepted: 10/29/2020] [Indexed: 12/11/2022] Open
Abstract
Accessible chromatin is a highly informative structural feature for identifying regulatory elements, which provides a large amount of information about transcriptional activity and gene regulatory mechanisms. Human ATAC-seq datasets are accumulating rapidly, prompting an urgent need to comprehensively collect and effectively process these data. We developed a comprehensive human chromatin accessibility database (ATACdb, http://www.licpathway.net/ATACdb), with the aim of providing a large amount of publicly available resources on human chromatin accessibility data, and to annotate and illustrate potential roles in a tissue/cell type-specific manner. The current version of ATACdb documented a total of 52 078 883 regions from over 1400 ATAC-seq samples. These samples have been manually curated from over 2200 chromatin accessibility samples from NCBI GEO/SRA. To make these datasets more accessible to the research community, ATACdb provides a quality assurance process including four quality control (QC) metrics. ATACdb provides detailed (epi)genetic annotations in chromatin accessibility regions, including super-enhancers, typical enhancers, transcription factors (TFs), common single-nucleotide polymorphisms (SNPs), risk SNPs, eQTLs, LD SNPs, methylations, chromatin interactions and TADs. Especially, ATACdb provides accurate inference of TF footprints within chromatin accessibility regions. ATACdb is a powerful platform that provides the most comprehensive accessible chromatin data, QC, TF footprint and various other annotations.
Collapse
Affiliation(s)
- Fan Wang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Xuefeng Bai
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Yuezhu Wang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Yong Jiang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Bo Ai
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Yong Zhang
- School of Physics and Electronic Engineering, Northeast Petroleum University, Daqing 163318, China
| | - Yuejuan Liu
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Mingcong Xu
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Qiuyu Wang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Xiaole Han
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Qi Pan
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Yanyu Li
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Xuecang Li
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Jian Zhang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Jun Zhao
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Guorui Zhang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Chenchen Feng
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Jiang Zhu
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Chunquan Li
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| |
Collapse
|
24
|
Jing F, Zhang SW, Cao Z, Zhang S. An Integrative Framework for Combining Sequence and Epigenomic Data to Predict Transcription Factor Binding Sites Using Deep Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:355-364. [PMID: 30835229 DOI: 10.1109/tcbb.2019.2901789] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Knowing the transcription factor binding sites (TFBSs) is essential for modeling the underlying binding mechanisms and follow-up cellular functions. Convolutional neural networks (CNNs) have outperformed methods in predicting TFBSs from the primary DNA sequence. In addition to DNA sequences, histone modifications and chromatin accessibility are also important factors influencing their activity. They have been explored to predict TFBSs recently. However, current methods rarely take into account histone modifications and chromatin accessibility using CNN in an integrative framework. To this end, we developed a general CNN model to integrate these data for predicting TFBSs. We systematically benchmarked a series of architecture variants by changing network structure in terms of width and depth, and explored the effects of sample length at flanking regions. We evaluated the performance of the three types of data and their combinations using 256 ChIP-seq experiments and also compared it with competing machine learning methods. We find that contributions from these three types of data are complementary to each other. Moreover, the integrative CNN framework is superior to traditional machine learning methods with significant improvements.
Collapse
|
25
|
Abstract
The ATAC-seq assay has emerged as the most useful, versatile, and widely adaptable method for profiling accessible chromatin regions and tracking the activity of cis-regulatory elements (cREs) in eukaryotes. Thanks to its great utility, it is now being applied to map active chromatin in the context of a very wide diversity of biological systems and questions. In the course of these studies, considerable experience working with ATAC-seq data has accumulated and a standard set of computational tasks that need to be carried for most ATAC-seq analyses has emerged. Here, we review and provide examples of common such analytical procedures (including data processing, quality control, peak calling, identifying differentially accessible open chromatin regions, and variable transcription factor (TF) motif accessibility) and discuss recommended optimal practices.
Collapse
|
26
|
Baloni P, Funk CC, Yan J, Yurkovich JT, Kueider-Paisley A, Nho K, Heinken A, Jia W, Mahmoudiandehkordi S, Louie G, Saykin AJ, Arnold M, Kastenmüller G, Griffiths WJ, Thiele I, Kaddurah-Daouk R, Price ND. Metabolic Network Analysis Reveals Altered Bile Acid Synthesis and Metabolism in Alzheimer's Disease. CELL REPORTS MEDICINE 2020; 1:100138. [PMID: 33294859 PMCID: PMC7691449 DOI: 10.1016/j.xcrm.2020.100138] [Citation(s) in RCA: 108] [Impact Index Per Article: 21.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Revised: 06/26/2020] [Accepted: 10/19/2020] [Indexed: 12/12/2022]
Abstract
Increasing evidence suggests Alzheimer's disease (AD) pathophysiology is influenced by primary and secondary bile acids, the end product of cholesterol metabolism. We analyze 2,114 post-mortem brain transcriptomes and identify genes in the alternative bile acid synthesis pathway to be expressed in the brain. A targeted metabolomic analysis of primary and secondary bile acids measured from post-mortem brain samples of 111 individuals supports these results. Our metabolic network analysis suggests that taurine transport, bile acid synthesis, and cholesterol metabolism differ in AD and cognitively normal individuals. We also identify putative transcription factors regulating metabolic genes and influencing altered metabolism in AD. Intriguingly, some bile acids measured in brain tissue cannot be explained by the presence of enzymes responsible for their synthesis, suggesting that they may originate from the gut microbiome and are transported to the brain. These findings motivate further research into bile acid metabolism in AD to elucidate their possible connection to cognitive decline.
Collapse
Affiliation(s)
| | - Cory C Funk
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Jingwen Yan
- Indiana Alzheimer Disease Center and Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
| | | | - Alexandra Kueider-Paisley
- Department of Psychiatry and Behavioral Medicine, Duke Institute for Brain Sciences, Duke University, Durham, NC 27708, USA
| | - Kwangsik Nho
- Indiana Alzheimer Disease Center and Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Almut Heinken
- School of Medicine, National University of Ireland, Galway, Ireland
| | - Wei Jia
- Cancer Biology Program, The University of Hawaii Cancer Center, Honolulu, HI, USA
| | - Siamak Mahmoudiandehkordi
- Department of Psychiatry and Behavioral Medicine, Duke Institute for Brain Sciences, Duke University, Durham, NC 27708, USA
| | - Gregory Louie
- Department of Psychiatry and Behavioral Medicine, Duke Institute for Brain Sciences, Duke University, Durham, NC 27708, USA
| | - Andrew J Saykin
- Indiana Alzheimer Disease Center and Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Matthias Arnold
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München-German Research Center for Environmental Health, Neuherberg, Germany
| | - Gabi Kastenmüller
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München-German Research Center for Environmental Health, Neuherberg, Germany
| | - William J Griffiths
- Swansea University Medical School, ILS1 Building, Singleton Park, Swansea SA2 8PP, UK
| | - Ines Thiele
- School of Medicine, National University of Ireland, Galway, Ireland.,Discipline of Microbiology, School of Natural Sciences, National University of Ireland, Galway, Ireland
| | | | - Rima Kaddurah-Daouk
- Department of Psychiatry and Behavioral Medicine, Duke Institute for Brain Sciences, Duke University, Durham, NC 27708, USA
| | | |
Collapse
|
27
|
Prager BC, Vasudevan HN, Dixit D, Bernatchez JA, Wu Q, Wallace LC, Bhargava S, Lee D, King BH, Morton AR, Gimple RC, Pekmezci M, Zhu Z, Siqueira-Neto JL, Wang X, Xie Q, Chen C, Barnett GH, Vogelbaum MA, Mack SC, Chavez L, Perry A, Raleigh DR, Rich JN. The Meningioma Enhancer Landscape Delineates Novel Subgroups and Drives Druggable Dependencies. Cancer Discov 2020; 10:1722-1741. [PMID: 32703768 PMCID: PMC8194360 DOI: 10.1158/2159-8290.cd-20-0160] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2020] [Revised: 06/06/2020] [Accepted: 07/20/2020] [Indexed: 01/05/2023]
Abstract
Meningiomas are the most common primary intracranial tumor with current classification offering limited therapeutic guidance. Here, we interrogated meningioma enhancer landscapes from 33 tumors to stratify patients based upon prognosis and identify novel meningioma-specific dependencies. Enhancers robustly stratified meningiomas into three biologically distinct groups (adipogenesis/cholesterol, mesodermal, and neural crest) distinguished by distinct hormonal lineage transcriptional regulators. Meningioma landscapes clustered with intrinsic brain tumors and hormonally responsive systemic cancers with meningioma subgroups, reflecting progesterone or androgen hormonal signaling. Enhancer classification identified a subset of tumors with poor prognosis, irrespective of histologic grading. Superenhancer signatures predicted drug dependencies with superior in vitro efficacy to treatment based upon the NF2 genomic profile. Inhibition of DUSP1, a novel and druggable meningioma target, impaired tumor growth in vivo. Collectively, epigenetic landscapes empower meningioma classification and identification of novel therapies. SIGNIFICANCE: Enhancer landscapes inform prognostic classification of aggressive meningiomas, identifying tumors at high risk of recurrence, and reveal previously unknown therapeutic targets. Druggable dependencies discovered through epigenetic profiling potentially guide treatment of intractable meningiomas.This article is highlighted in the In This Issue feature, p. 1611.
Collapse
Affiliation(s)
- Briana C Prager
- Division of Regenerative Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
- Sanford Consortium for Regenerative Medicine, La Jolla, California
- Cleveland Clinic Lerner College of Medicine, Cleveland Clinic, Cleveland, Ohio
- Case Western Reserve University Medical Scientist Training Program, Case Western Reserve University School of Medicine, Cleveland, Ohio
| | - Harish N Vasudevan
- Department of Radiation Oncology, University of California, San Francisco, San Francisco, California
| | - Deobrat Dixit
- Division of Regenerative Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
- Sanford Consortium for Regenerative Medicine, La Jolla, California
| | - Jean A Bernatchez
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, California
- Center for Discovery and Innovation in Parasitic Diseases, University of California, San Diego, La Jolla, California
| | - Qiulian Wu
- Division of Regenerative Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
- Sanford Consortium for Regenerative Medicine, La Jolla, California
| | - Lisa C Wallace
- Department of Biomedical Engineering, Cleveland Clinic, Cleveland, Ohio
| | - Shruti Bhargava
- Division of Regenerative Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
- Sanford Consortium for Regenerative Medicine, La Jolla, California
| | - Derrick Lee
- Division of Regenerative Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
- Sanford Consortium for Regenerative Medicine, La Jolla, California
- University of California San Diego School of Medicine, University of California, San Diego, La Jolla, California
| | - Bradley H King
- Division of Regenerative Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
- Sanford Consortium for Regenerative Medicine, La Jolla, California
- University of California San Diego School of Medicine, University of California, San Diego, La Jolla, California
| | - Andrew R Morton
- Case Western Reserve University Medical Scientist Training Program, Case Western Reserve University School of Medicine, Cleveland, Ohio
| | - Ryan C Gimple
- Division of Regenerative Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
- Sanford Consortium for Regenerative Medicine, La Jolla, California
- Case Western Reserve University Medical Scientist Training Program, Case Western Reserve University School of Medicine, Cleveland, Ohio
| | - Melike Pekmezci
- Department of Pathology, University of California, San Francisco, San Francisco, California
| | - Zhe Zhu
- Division of Regenerative Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
| | - Jair L Siqueira-Neto
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, California
- Center for Discovery and Innovation in Parasitic Diseases, University of California, San Diego, La Jolla, California
| | - Xiuxing Wang
- Division of Regenerative Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
- Sanford Consortium for Regenerative Medicine, La Jolla, California
- School of Basic Medical Sciences, Nanjing Medical University, Nanjing, China
| | - Qi Xie
- Division of Regenerative Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
- Sanford Consortium for Regenerative Medicine, La Jolla, California
- Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Westlake University, Hangzhou, China
| | - Clark Chen
- Department of Neurosurgery, University of Minnesota, Minneapolis, Minnesota
| | - Gene H Barnett
- Department of Neurosurgery, Cleveland Clinic, Cleveland, Ohio
- Cleveland Clinic Lerner College of Medicine, Cleveland Clinic, Cleveland, Ohio
| | - Michael A Vogelbaum
- Department of Neurosurgery, University of Minnesota, Minneapolis, Minnesota
- Department of NeuroOncology, Moffitt Cancer Center, Tampa, Florida
| | | | - Lukas Chavez
- Department of Medicine, University of California, San Diego, San Diego, California
| | - Arie Perry
- Department of Pathology, University of California, San Francisco, San Francisco, California
| | - David R Raleigh
- Department of Radiation Oncology, University of California, San Francisco, San Francisco, California.
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, California
| | - Jeremy N Rich
- Division of Regenerative Medicine, Department of Medicine, University of California, San Diego, La Jolla, California.
- Sanford Consortium for Regenerative Medicine, La Jolla, California
- Department of Neurosciences, University of California, San Diego, La Jolla, California
| |
Collapse
|
28
|
Funk CC, Casella AM, Jung S, Richards MA, Rodriguez A, Shannon P, Donovan-Maiye R, Heavner B, Chard K, Xiao Y, Glusman G, Ertekin-Taner N, Golde TE, Toga A, Hood L, Van Horn JD, Kesselman C, Foster I, Madduri R, Price ND, Ament SA. Atlas of Transcription Factor Binding Sites from ENCODE DNase Hypersensitivity Data across 27 Tissue Types. Cell Rep 2020; 32:108029. [PMID: 32814038 PMCID: PMC7462736 DOI: 10.1016/j.celrep.2020.108029] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Revised: 05/07/2020] [Accepted: 07/22/2020] [Indexed: 12/27/2022] Open
Abstract
Characterizing the tissue-specific binding sites of transcription factors (TFs) is essential to reconstruct gene regulatory networks and predict functions for non-coding genetic variation. DNase-seq footprinting enables the prediction of genome-wide binding sites for hundreds of TFs simultaneously. Despite the public availability of high-quality DNase-seq data from hundreds of samples, a comprehensive, up-to-date resource for the locations of genomic footprints is lacking. Here, we develop a scalable footprinting workflow using two state-of-the-art algorithms: Wellington and HINT. We apply our workflow to detect footprints in 192 ENCODE DNase-seq experiments and predict the genomic occupancy of 1,515 human TFs in 27 human tissues. We validate that these footprints overlap true-positive TF binding sites from ChIP-seq. We demonstrate that the locations, depth, and tissue specificity of footprints predict effects of genetic variants on gene expression and capture a substantial proportion of genetic risk for complex traits.
Collapse
Affiliation(s)
- Cory C Funk
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Alex M Casella
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA; Medical Scientist Training Program, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Segun Jung
- Globus, University of Chicago, Chicago, IL 60637, USA
| | | | | | - Paul Shannon
- Institute for Systems Biology, Seattle, WA 98109, USA
| | | | - Ben Heavner
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Kyle Chard
- Globus, University of Chicago, Chicago, IL 60637, USA
| | - Yukai Xiao
- Globus, University of Chicago, Chicago, IL 60637, USA
| | | | | | - Todd E Golde
- Mayo Clinic, Department of Neuroscience, Jacksonville, FL 32224, USA
| | - Arthur Toga
- Mark and Mary Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA 90033, USA
| | - Leroy Hood
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - John D Van Horn
- Department of Psychology, University of Southern California, Los Angeles, CA 90007, USA
| | - Carl Kesselman
- Information Sciences Institute, University of Southern California, Los Angeles, CA 90292, USA
| | - Ian Foster
- Globus, University of Chicago, Chicago, IL 60637, USA; Data Science and Learning Division, Argonne National Laboratory, Argonne, IL 60439, USA
| | - Ravi Madduri
- Globus, University of Chicago, Chicago, IL 60637, USA; Data Science and Learning Division, Argonne National Laboratory, Argonne, IL 60439, USA.
| | | | - Seth A Ament
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA; Department of Psychiatry, University of Maryland School of Medicine, Baltimore, MD 21201, USA.
| |
Collapse
|
29
|
Liu Y, Fu L, Kaufmann K, Chen D, Chen M. A practical guide for DNase-seq data analysis: from data management to common applications. Brief Bioinform 2020; 20:1865-1877. [PMID: 30010713 DOI: 10.1093/bib/bby057] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Revised: 06/06/2018] [Accepted: 06/10/2018] [Indexed: 01/01/2023] Open
Abstract
Deoxyribonuclease I (DNase I)-hypersensitive site sequencing (DNase-seq) has been widely used to determine chromatin accessibility and its underlying regulatory lexicon. However, exploring DNase-seq data requires sophisticated downstream bioinformatics analyses. In this study, we first review computational methods for all of the major steps in DNase-seq data analysis, including experimental design, quality control, read alignment, peak calling, annotation of cis-regulatory elements, genomic footprinting and visualization. The challenges associated with each step are highlighted. Next, we provide a practical guideline and a computational pipeline for DNase-seq data analysis by integrating some of these tools. We also discuss the competing techniques and the potential applications of this pipeline for the analysis of analogous experimental data. Finally, we discuss the integration of DNase-seq with other functional genomics techniques.
Collapse
Affiliation(s)
- Yongjing Liu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Liangyu Fu
- Department for Plant Cell and Molecular Biology, Institute for Biology, Humboldt-Universität zu Berlin, Berlin 10115, Germany
| | - Kerstin Kaufmann
- Department for Plant Cell and Molecular Biology, Institute for Biology, Humboldt-Universität zu Berlin, Berlin 10115, Germany
| | - Dijun Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ming Chen
- Department for Plant Cell and Molecular Biology, Institute for Biology, Humboldt-Universität zu Berlin, Berlin 10115, Germany
| |
Collapse
|
30
|
Höllbacher B, Balázs K, Heinig M, Uhlenhaut NH. Seq-ing answers: Current data integration approaches to uncover mechanisms of transcriptional regulation. Comput Struct Biotechnol J 2020; 18:1330-1341. [PMID: 32612756 PMCID: PMC7306512 DOI: 10.1016/j.csbj.2020.05.018] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 05/21/2020] [Accepted: 05/23/2020] [Indexed: 02/06/2023] Open
Abstract
Advancements in the field of next generation sequencing lead to the generation of ever-more data, with the challenge often being how to combine and reconcile results from different OMICs studies such as genome, epigenome and transcriptome. Here we provide an overview of the standard processing pipelines for ChIP-seq and RNA-seq as well as common downstream analyses. We describe popular multi-omics data integration approaches used to identify target genes and co-factors, and we discuss how machine learning techniques may predict transcriptional regulators and gene expression.
Collapse
Affiliation(s)
- Barbara Höllbacher
- Institute for Diabetes and Cancer IDC, Helmholtz Zentrum Muenchen (HMGU) and German Center for Diabetes Research (DZD), Munich 85764, Neuherberg, Germany.,Institute of Computational Biology ICB, Helmholtz Zentrum Muenchen (HMGU) and German Center for Diabetes Research (DZD), Munich 85764, Neuherberg, Germany.,Department of Informatics, TUM, Munich 85748, Garching, Germany
| | - Kinga Balázs
- Institute for Diabetes and Cancer IDC, Helmholtz Zentrum Muenchen (HMGU) and German Center for Diabetes Research (DZD), Munich 85764, Neuherberg, Germany
| | - Matthias Heinig
- Institute of Computational Biology ICB, Helmholtz Zentrum Muenchen (HMGU) and German Center for Diabetes Research (DZD), Munich 85764, Neuherberg, Germany.,Department of Informatics, TUM, Munich 85748, Garching, Germany
| | - N Henriette Uhlenhaut
- Institute for Diabetes and Cancer IDC, Helmholtz Zentrum Muenchen (HMGU) and German Center for Diabetes Research (DZD), Munich 85764, Neuherberg, Germany.,Metabolic Programming, TUM School of Life Sciences Weihenstephan, Munich 85354, Freising, Germany
| |
Collapse
|
31
|
van de Geijn B, Finucane H, Gazal S, Hormozdiari F, Amariuta T, Liu X, Gusev A, Loh PR, Reshef Y, Kichaev G, Raychauduri S, Price AL. Annotations capturing cell type-specific TF binding explain a large fraction of disease heritability. Hum Mol Genet 2020; 29:1057-1067. [PMID: 31595288 PMCID: PMC7206853 DOI: 10.1093/hmg/ddz226] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 08/12/2019] [Accepted: 09/10/2019] [Indexed: 12/21/2022] Open
Abstract
Regulatory variation plays a major role in complex disease and that cell type-specific binding of transcription factors (TF) is critical to gene regulation. However, assessing the contribution of genetic variation in TF-binding sites to disease heritability is challenging, as binding is often cell type-specific and annotations from directly measured TF binding are not currently available for most cell type-TF pairs. We investigate approaches to annotate TF binding, including directly measured chromatin data and sequence-based predictions. We find that TF-binding annotations constructed by intersecting sequence-based TF-binding predictions with cell type-specific chromatin data explain a large fraction of heritability across a broad set of diseases and corresponding cell types; this strategy of constructing annotations addresses both the limitation that identical sequences may be bound or unbound depending on surrounding chromatin context and the limitation that sequence-based predictions are generally not cell type-specific. We partitioned the heritability of 49 diseases and complex traits using stratified linkage disequilibrium (LD) score regression with the baseline-LD model (which is not cell type-specific) plus the new annotations. We determined that 100 bp windows around MotifMap sequenced-based TF-binding predictions intersected with a union of six cell type-specific chromatin marks (imputed using ChromImpute) performed best, with an 58% increase in heritability enrichment compared to the chromatin marks alone (11.6× vs. 7.3×, P = 9 × 10-14 for difference) and a 20% increase in cell type-specific signal conditional on annotations from the baseline-LD model (P = 8 × 10-11 for difference). Our results show that TF-binding annotations explain substantial disease heritability and can help refine genome-wide association signals.
Collapse
Affiliation(s)
- Bryce van de Geijn
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston 02115, MA, USA
| | - Hilary Finucane
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Steven Gazal
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston 02115, MA, USA
| | - Farhad Hormozdiari
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston 02115, MA, USA
| | - Tiffany Amariuta
- Center for Data Sciences, Harvard Medical School, Boston, MA 02215, USA
- Divisions of Genetics, Rheumatology, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02215, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02215, USA
- Graduate School of Arts and Sciences, Harvard University, Boston, MA 02215, USA
| | - Xuanyao Liu
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston 02115, MA, USA
| | | | - Po-Ru Loh
- Brigham and Women’s Hospital, Boston, MA 02215, USA
| | - Yakir Reshef
- Department of Computer Science, Harvard University, Cambridge, MA 02138, USA
- Harvard/MIT MD/PhD Program, Boston, MA 02215, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02215, USA
| | - Gleb Kichaev
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Soumya Raychauduri
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston 02115, MA, USA
- Center for Data Sciences, Harvard Medical School, Boston, MA 02215, USA
- Divisions of Genetics, Rheumatology, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02215, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02215, USA
- Graduate School of Arts and Sciences, Harvard University, Boston, MA 02215, USA
| | - Alkes L Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston 02115, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02215, USA
| |
Collapse
|
32
|
Hovhannisyan H, Saus E, Ksiezopolska E, Hinks Roberts AJ, Louis EJ, Gabaldón T. Integrative Omics Analysis Reveals a Limited Transcriptional Shock After Yeast Interspecies Hybridization. Front Genet 2020; 11:404. [PMID: 32457798 PMCID: PMC7221068 DOI: 10.3389/fgene.2020.00404] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Accepted: 03/30/2020] [Indexed: 12/30/2022] Open
Abstract
The formation of interspecific hybrids results in the coexistence of two diverged genomes within the same nucleus. It has been hypothesized that negative epistatic interactions and regulatory interferences between the two sub-genomes may elicit a so-called genomic shock involving, among other alterations, broad transcriptional changes. To assess the magnitude of this shock in hybrid yeasts, we investigated the transcriptomic differences between a newly formed Saccharomyces cerevisiae × Saccharomyces uvarum diploid hybrid and its diploid parentals, which diverged ∼20 mya. RNA sequencing (RNA-Seq) based allele-specific expression (ASE) analysis indicated that gene expression changes in the hybrid genome are limited, with only ∼1-2% of genes significantly altering their expression with respect to a non-hybrid context. In comparison, a thermal shock altered six times more genes. Furthermore, differences in the expression between orthologous genes in the two parental species tended to be diminished for the corresponding homeologous genes in the hybrid. Finally, and consistent with the RNA-Seq results, we show a limited impact of hybridization on chromatin accessibility patterns, as assessed with assay for transposase-accessible chromatin using sequencing (ATAC-Seq). Overall, our results suggest a limited genomic shock in a newly formed yeast hybrid, which may explain the high frequency of successful hybridization in these organisms.
Collapse
Affiliation(s)
- Hrant Hovhannisyan
- Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
- Department of Health and Life Sciences. Universitat Pompeu Fabra, Barcelona, Spain
| | - Ester Saus
- Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
- Department of Health and Life Sciences. Universitat Pompeu Fabra, Barcelona, Spain
| | - Ewa Ksiezopolska
- Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
- Department of Health and Life Sciences. Universitat Pompeu Fabra, Barcelona, Spain
| | - Alex J. Hinks Roberts
- Centre for Genetic Architecture of Complex Traits, University of Leicester, Leicester, United Kingdom
| | - Edward J. Louis
- Centre for Genetic Architecture of Complex Traits, University of Leicester, Leicester, United Kingdom
| | - Toni Gabaldón
- Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
- Department of Health and Life Sciences. Universitat Pompeu Fabra, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain
| |
Collapse
|
33
|
Yan F, Powell DR, Curtis DJ, Wong NC. From reads to insight: a hitchhiker's guide to ATAC-seq data analysis. Genome Biol 2020; 21:22. [PMID: 32014034 PMCID: PMC6996192 DOI: 10.1186/s13059-020-1929-3] [Citation(s) in RCA: 249] [Impact Index Per Article: 49.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Accepted: 01/08/2020] [Indexed: 12/16/2022] Open
Abstract
Assay of Transposase Accessible Chromatin sequencing (ATAC-seq) is widely used in studying chromatin biology, but a comprehensive review of the analysis tools has not been completed yet. Here, we discuss the major steps in ATAC-seq data analysis, including pre-analysis (quality check and alignment), core analysis (peak calling), and advanced analysis (peak differential analysis and annotation, motif enrichment, footprinting, and nucleosome position analysis). We also review the reconstruction of transcriptional regulatory networks with multiomics data and highlight the current challenges of each step. Finally, we describe the potential of single-cell ATAC-seq and highlight the necessity of developing ATAC-seq specific analysis tools to obtain biologically meaningful insights.
Collapse
Affiliation(s)
- Feng Yan
- Australian Centre for Blood Diseases, Central Clinical School, Monash University, Melbourne, VIC, Australia
| | - David R Powell
- Monash Bioinformatics Platform, Monash University, Melbourne, VIC, Australia
| | - David J Curtis
- Australian Centre for Blood Diseases, Central Clinical School, Monash University, Melbourne, VIC, Australia.,Department of Clinical Haematology, Alfred Health, Melbourne, VIC, Australia
| | - Nicholas C Wong
- Australian Centre for Blood Diseases, Central Clinical School, Monash University, Melbourne, VIC, Australia. .,Monash Bioinformatics Platform, Monash University, Melbourne, VIC, Australia.
| |
Collapse
|
34
|
Xu T, Zheng X, Li B, Jin P, Qin Z, Wu H. A comprehensive review of computational prediction of genome-wide features. Brief Bioinform 2020; 21:120-134. [PMID: 30462144 PMCID: PMC10233247 DOI: 10.1093/bib/bby110] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Revised: 10/15/2018] [Accepted: 10/16/2018] [Indexed: 12/15/2022] Open
Abstract
There are significant correlations among different types of genetic, genomic and epigenomic features within the genome. These correlations make the in silico feature prediction possible through statistical or machine learning models. With the accumulation of a vast amount of high-throughput data, feature prediction has gained significant interest lately, and a plethora of papers have been published in the past few years. Here we provide a comprehensive review on these published works, categorized by the prediction targets, including protein binding site, enhancer, DNA methylation, chromatin structure and gene expression. We also provide discussions on some important points and possible future directions.
Collapse
Affiliation(s)
- Tianlei Xu
- Department of Mathematics and Computer Science, Emory University, Atlanta, GA, USA
| | - Xiaoqi Zheng
- Department of Mathematics, Shanghai Normal University, Shanghai, China
| | - Ben Li
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Peng Jin
- Department of Human Genetics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Zhaohui Qin
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| |
Collapse
|
35
|
Behjati Ardakani F, Schmidt F, Schulz MH. Predicting transcription factor binding using ensemble random forest models. F1000Res 2019; 7:1603. [PMID: 31723409 PMCID: PMC6823902 DOI: 10.12688/f1000research.16200.2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/15/2019] [Indexed: 12/03/2022] Open
Abstract
Background: Understanding the location and cell-type specific binding of Transcription Factors (TFs) is important in the study of gene regulation. Computational prediction of TF binding sites is challenging, because TFs often bind only to short DNA motifs and cell-type specific co-factors may work together with the same TF to determine binding. Here, we consider the problem of learning a general model for the prediction of TF binding using DNase1-seq data and TF motif description in form of position specific energy matrices (PSEMs). Methods: We use TF ChIP-seq data as a gold-standard for model training and evaluation. Our contribution is a novel ensemble learning approach using random forest classifiers. In the context of the
ENCODE-DREAM in vivo TF binding site prediction challenge we consider different learning setups. Results: Our results indicate that the ensemble learning approach is able to better generalize across tissues and cell-types compared to individual tissue-specific classifiers or a classifier built based upon data aggregated across tissues. Furthermore, we show that incorporating DNase1-seq peaks is essential to reduce the false positive rate of TF binding predictions compared to considering the raw DNase1 signal. Conclusions: Analysis of important features reveals that the models preferentially select motifs of other TFs that are close interaction partners in existing protein protein-interaction networks. Code generated in the scope of this project is available on GitHub:
https://github.com/SchulzLab/TFAnalysis (DOI: 10.5281/zenodo.1409697).
Collapse
Affiliation(s)
- Fatemeh Behjati Ardakani
- High throughput Genomics and Systems Biology, Cluster of Excellence on Multimodel Computing and Interaction, Saarland University, Saarbruecken,, Saarland, 66123, Germany.,Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbruecken, Saarland, 66123, Germany.,Graduate School of computer science, Saarland University, Saarbruecken, Saarland, 66123, Germany
| | - Florian Schmidt
- High throughput Genomics and Systems Biology, Cluster of Excellence on Multimodel Computing and Interaction, Saarland University, Saarbruecken,, Saarland, 66123, Germany.,Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbruecken, Saarland, 66123, Germany.,Graduate School of computer science, Saarland University, Saarbruecken, Saarland, 66123, Germany.,Computational Systems Biology, Genome Institute of Singapore, Singapore, Singapore
| | - Marcel H Schulz
- High throughput Genomics and Systems Biology, Cluster of Excellence on Multimodel Computing and Interaction, Saarland University, Saarbruecken,, Saarland, 66123, Germany.,Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbruecken, Saarland, 66123, Germany.,Institute for Cardiovasular Regeneration, Goethe University Frankfurt Am Main, Frankfurt Am Main, Hessen, 60590, Germany
| |
Collapse
|
36
|
Zitnik M, Nguyen F, Wang B, Leskovec J, Goldenberg A, Hoffman MM. Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities. AN INTERNATIONAL JOURNAL ON INFORMATION FUSION 2019; 50:71-91. [PMID: 30467459 PMCID: PMC6242341 DOI: 10.1016/j.inffus.2018.09.012] [Citation(s) in RCA: 262] [Impact Index Per Article: 43.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include myriad properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Computer Science, Stanford University,
Stanford, CA, USA
| | - Francis Nguyen
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - Bo Wang
- Hikvision Research Institute, Santa Clara, CA, USA
| | - Jure Leskovec
- Department of Computer Science, Stanford University,
Stanford, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Anna Goldenberg
- Genetics & Genome Biology, SickKids Research Institute,
Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| | - Michael M. Hoffman
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| |
Collapse
|
37
|
Youn A, Marquez EJ, Lawlor N, Stitzel ML, Ucar D. BiFET: sequencing Bias-free transcription factor Footprint Enrichment Test. Nucleic Acids Res 2019; 47:e11. [PMID: 30428075 PMCID: PMC6344870 DOI: 10.1093/nar/gky1117] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Accepted: 10/23/2018] [Indexed: 01/15/2023] Open
Abstract
Transcription factor (TF) footprinting uncovers putative protein–DNA binding via combined analyses of chromatin accessibility patterns and their underlying TF sequence motifs. TF footprints are frequently used to identify TFs that regulate activities of cell/condition-specific genomic regions (target loci) in comparison to control regions (background loci) using standard enrichment tests. However, there is a strong association between the chromatin accessibility level and the GC content of a locus and the number and types of TF footprints that can be detected at this site. Traditional enrichment tests (e.g. hypergeometric) do not account for this bias and inflate false positive associations. Therefore, we developed a novel post-processing method, Bias-free Footprint Enrichment Test (BiFET), that corrects for the biases arising from the differences in chromatin accessibility levels and GC contents between target and background loci in footprint enrichment analyses. We applied BiFET on TF footprint calls obtained from EndoC-βH1 ATAC-seq samples using three different algorithms (CENTIPEDE, HINT-BC and PIQ) and showed BiFET’s ability to increase power and reduce false positive rate when compared to hypergeometric test. Furthermore, we used BiFET to study TF footprints from human PBMC and pancreatic islet ATAC-seq samples to show its utility to identify putative TFs associated with cell-type-specific loci.
Collapse
Affiliation(s)
- Ahrim Youn
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Eladio J Marquez
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Nathan Lawlor
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Michael L Stitzel
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA.,Institute for Systems Genomics, University of Connecticut Health Center, Farmington, CT 06030, USA.,Department of Genetics & Genome Sciences, University of Connecticut Health Center, Farmington, CT 06030, USA
| | - Duygu Ucar
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA.,Institute for Systems Genomics, University of Connecticut Health Center, Farmington, CT 06030, USA.,Department of Genetics & Genome Sciences, University of Connecticut Health Center, Farmington, CT 06030, USA
| |
Collapse
|
38
|
Madduri R, Chard K, D’Arcy M, Jung SC, Rodriguez A, Sulakhe D, Deutsch E, Funk C, Heavner B, Richards M, Shannon P, Glusman G, Price N, Kesselman C, Foster I. Reproducible big data science: A case study in continuous FAIRness. PLoS One 2019; 14:e0213013. [PMID: 30973881 PMCID: PMC6459504 DOI: 10.1371/journal.pone.0213013] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Accepted: 02/13/2019] [Indexed: 01/22/2023] Open
Abstract
Big biomedical data create exciting opportunities for discovery, but make it difficult to capture analyses and outputs in forms that are findable, accessible, interoperable, and reusable (FAIR). In response, we describe tools that make it easy to capture, and assign identifiers to, data and code throughout the data lifecycle. We illustrate the use of these tools via a case study involving a multi-step analysis that creates an atlas of putative transcription factor binding sites from terabytes of ENCODE DNase I hypersensitive sites sequencing data. We show how the tools automate routine but complex tasks, capture analysis algorithms in understandable and reusable forms, and harness fast networks and powerful cloud computers to process data rapidly, all without sacrificing usability or reproducibility-thus ensuring that big data are not hard-to-(re)use data. We evaluate our approach via a user study, and show that 91% of participants were able to replicate a complex analysis involving considerable data volumes.
Collapse
Affiliation(s)
- Ravi Madduri
- Globus, University of Chicago, Chicago, Illinois, United States of America
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois, United States of America
| | - Kyle Chard
- Globus, University of Chicago, Chicago, Illinois, United States of America
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois, United States of America
| | - Mike D’Arcy
- Information Sciences Institute, University of Southern California, Los Angeles, California, United States of America
| | - Segun C. Jung
- Globus, University of Chicago, Chicago, Illinois, United States of America
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois, United States of America
| | - Alexis Rodriguez
- Globus, University of Chicago, Chicago, Illinois, United States of America
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois, United States of America
| | - Dinanath Sulakhe
- Globus, University of Chicago, Chicago, Illinois, United States of America
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois, United States of America
| | - Eric Deutsch
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Cory Funk
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Ben Heavner
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, Washington, United States of America
| | - Matthew Richards
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Paul Shannon
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Gustavo Glusman
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Nathan Price
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Carl Kesselman
- Information Sciences Institute, University of Southern California, Los Angeles, California, United States of America
| | - Ian Foster
- Globus, University of Chicago, Chicago, Illinois, United States of America
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois, United States of America
- Department of Computer Science, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
39
|
Li Z, Schulz MH, Look T, Begemann M, Zenke M, Costa IG. Identification of transcription factor binding sites using ATAC-seq. Genome Biol 2019; 20:45. [PMID: 30808370 PMCID: PMC6391789 DOI: 10.1186/s13059-019-1642-2] [Citation(s) in RCA: 274] [Impact Index Per Article: 45.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2018] [Accepted: 01/25/2019] [Indexed: 01/07/2023] Open
Abstract
Transposase-Accessible Chromatin followed by sequencing (ATAC-seq) is a simple protocol for detection of open chromatin. Computational footprinting, the search for regions with depletion of cleavage events due to transcription factor binding, is poorly understood for ATAC-seq. We propose the first footprinting method considering ATAC-seq protocol artifacts. HINT-ATAC uses a position dependency model to learn the cleavage preferences of the transposase. We observe strand-specific cleavage patterns around transcription factor binding sites, which are determined by local nucleosome architecture. By incorporating all these biases, HINT-ATAC is able to significantly outperform competing methods in the prediction of transcription factor binding sites with footprints.
Collapse
Affiliation(s)
- Zhijian Li
- Institute for Computational Genomics, Joint Research Center for Computational Biomedicine, RWTH Aachen University Medical School, Aachen, 52074 Germany
- Department of Cell Biology, Institute of Biomedical Engineering, RWTH Aachen University Medical School, Aachen, 52074 Germany
| | - Marcel H. Schulz
- Cluster of Excellence for Multimodal Computing and Interaction, Saarland Informatics Campus, Saarland University, Saarbrücken, Germany
- Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken, Germany
- Institute for Cardiovascular Regeneration, Goethe University, Frankfurt am Main, Germany
- German Centre for Cardiovascular Research (DZHK), Partner site RheinMain, Frankfurt am Main, Germany
| | - Thomas Look
- Department of Cell Biology, Institute of Biomedical Engineering, RWTH Aachen University Medical School, Aachen, 52074 Germany
- Helmholtz Institute for Biomedical Engineering, RWTH Aachen University, Aachen, Germany
| | - Matthias Begemann
- Institute of Human Genetics, RWTH Aachen University Medical School, Aachen, Germany
| | - Martin Zenke
- Department of Cell Biology, Institute of Biomedical Engineering, RWTH Aachen University Medical School, Aachen, 52074 Germany
- Helmholtz Institute for Biomedical Engineering, RWTH Aachen University, Aachen, Germany
| | - Ivan G. Costa
- Institute for Computational Genomics, Joint Research Center for Computational Biomedicine, RWTH Aachen University Medical School, Aachen, 52074 Germany
- Helmholtz Institute for Biomedical Engineering, RWTH Aachen University, Aachen, Germany
| |
Collapse
|
40
|
Karabacak Calviello A, Hirsekorn A, Wurmus R, Yusuf D, Ohler U. Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets using protocol-specific bias modeling. Genome Biol 2019; 20:42. [PMID: 30791920 PMCID: PMC6385462 DOI: 10.1186/s13059-019-1654-y] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 02/13/2019] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND DNase-seq and ATAC-seq are broadly used methods to assay open chromatin regions genome-wide. The single nucleotide resolution of DNase-seq has been further exploited to infer transcription factor binding sites (TFBSs) in regulatory regions through footprinting. Recent studies have demonstrated the sequence bias of DNase I and its adverse effects on footprinting efficiency. However, footprinting and the impact of sequence bias have not been extensively studied for ATAC-seq. RESULTS Here, we undertake a systematic comparison of the two methods and show that a modification to the ATAC-seq protocol increases its yield and its agreement with DNase-seq data from the same cell line. We demonstrate that the two methods have distinct sequence biases and correct for these protocol-specific biases when performing footprinting. Despite the differences in footprint shapes, the locations of the inferred footprints in ATAC-seq and DNase-seq are largely concordant. However, the protocol-specific sequence biases in conjunction with the sequence content of TFBSs impact the discrimination of footprint from the background, which leads to one method outperforming the other for some TFs. Finally, we address the depth required for reproducible identification of open chromatin regions and TF footprints. CONCLUSIONS We demonstrate that the impact of bias correction on footprinting performance is greater for DNase-seq than for ATAC-seq and that DNase-seq footprinting leads to better performance. It is possible to infer concordant footprints by using replicates, highlighting the importance of reproducibility assessment. The results presented here provide an overview of the advantages and limitations of footprinting analyses using ATAC-seq and DNase-seq.
Collapse
Affiliation(s)
- Aslıhan Karabacak Calviello
- Max Delbrück Center for Molecular Medicine, Berlin Institute for Medical Systems Biology, Berlin, Germany
- Department of Biology, Humboldt University, Berlin, Germany
| | - Antje Hirsekorn
- Max Delbrück Center for Molecular Medicine, Berlin Institute for Medical Systems Biology, Berlin, Germany
| | - Ricardo Wurmus
- Max Delbrück Center for Molecular Medicine, Berlin Institute for Medical Systems Biology, Berlin, Germany
| | - Dilmurat Yusuf
- Max Delbrück Center for Molecular Medicine, Berlin Institute for Medical Systems Biology, Berlin, Germany
| | - Uwe Ohler
- Max Delbrück Center for Molecular Medicine, Berlin Institute for Medical Systems Biology, Berlin, Germany.
- Department of Biology, Humboldt University, Berlin, Germany.
- Department of Computer Science, Humboldt University, Berlin, Germany.
| |
Collapse
|
41
|
Li H, Quang D, Guan Y. Anchor: trans-cell type prediction of transcription factor binding sites. Genome Res 2019; 29:281-292. [PMID: 30567711 PMCID: PMC6360811 DOI: 10.1101/gr.237156.118] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 12/13/2018] [Indexed: 12/16/2022]
Abstract
The ENCyclopedia of DNA Elements (ENCODE) consortium has generated transcription factor (TF) binding ChIP-seq data covering hundreds of TF proteins and cell types; however, due to limits on time and resources, only a small fraction of all possible TF-cell type pairs have been profiled. One solution is to build machine learning models trained on currently available epigenomic data sets that can be applied to the remaining missing pairs. A major challenge is that TF binding sites are cell-type-specific, which can be attributed to cellular contexts such as chromatin accessibility. Meanwhile, indirect TF-DNA binding and interactions between TFs complicate this regulatory process. Technical issues such as sequencing biases and batch effects render the prediction task even more challenging. Many pioneering efforts have been made to predict TF binding profiles based on DNA sequence and DNase-seq footprints, but to what extent a model can be generalized to completely untested cell conditions remains unknown. In this study, we describe our first place solution to the 2017 ENCODE-DREAM in vivo TF binding site prediction challenge. By carefully addressing multisource biases and information imbalance across cell types, we created a pipeline that significantly outperforms the current state-of-the-art methods. The proposed method is sufficiently complex enough to model nonlinear interactions between TF binding motifs and chromatin accessibility information up to 1500 bp from the genomic region of interest.
Collapse
Affiliation(s)
- Hongyang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Daniel Quang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
42
|
Keilwagen J, Posch S, Grau J. Accurate prediction of cell type-specific transcription factor binding. Genome Biol 2019; 20:9. [PMID: 30630522 PMCID: PMC6327544 DOI: 10.1186/s13059-018-1614-y] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Accepted: 12/18/2018] [Indexed: 01/11/2023] Open
Abstract
Prediction of cell type-specific, in vivo transcription factor binding sites is one of the central challenges in regulatory genomics. Here, we present our approach that earned a shared first rank in the "ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge" in 2017. In post-challenge analyses, we benchmark the influence of different feature sets and find that chromatin accessibility and binding motifs are sufficient to yield state-of-the-art performance. Finally, we provide 682 lists of predicted peaks for a total of 31 transcription factors in 22 primary cell types and tissues and a user-friendly version of our approach, Catchitt, for download.
Collapse
Affiliation(s)
- Jens Keilwagen
- Institute for Biosafety in Plant Biotechnology, Julius Kühn-Institut (JKI) - Federal Research Centre for Cultivated Plants, Erwin-Baur-Straße 27, Quedlinburg, 06484 Germany
| | - Stefan Posch
- Institute of Computer Science, Martin Luther University Halle–Wittenberg, Von-Seckendorff-Platz 1, Halle (Saale), 06120 Germany
| | - Jan Grau
- Institute of Computer Science, Martin Luther University Halle–Wittenberg, Von-Seckendorff-Platz 1, Halle (Saale), 06120 Germany
| |
Collapse
|
43
|
Pranzatelli TJF, Michael DG, Chiorini JA. ATAC2GRN: optimized ATAC-seq and DNase1-seq pipelines for rapid and accurate genome regulatory network inference. BMC Genomics 2018; 19:563. [PMID: 30064353 PMCID: PMC6069842 DOI: 10.1186/s12864-018-4943-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Accepted: 07/16/2018] [Indexed: 01/07/2023] Open
Abstract
Background Chromatin accessibility profiling assays such as ATAC-seq and DNase1-seq offer the opportunity to rapidly characterize the regulatory state of the genome at a single nucleotide resolution. Optimization of molecular protocols has enabled the molecular biologist to produce next-generation sequencing libraries in several hours, leaving the analysis of sequencing data as the primary obstacle to wide-scale deployment of accessibility profiling assays. To address this obstacle we have developed an optimized and efficient pipeline for the analysis of ATAC-seq and DNase1-seq data. Results We executed a multi-dimensional grid-search on the NIH Biowulf supercomputing cluster to assess the impact of parameter selection on biological reproducibility and ChIP-seq recovery by analyzing 4560 pipeline configurations. Our analysis improved ChIP-seq recovery by 15% for ATAC-seq and 3% for DNase1-seq and determined that PCR duplicate removal improves biological reproducibility by 36% without significant costs in footprinting transcription factors. Our analyses of down sampled reads identified a point of diminishing returns for increased library sequencing depth, with 95% of the ChIP-seq data of a 200 million read footprinting library recovered by 160 million reads. Conclusions We present optimized ATAC-seq and DNase-seq pipelines in both Snakemake and bash formats as well as optimal sequencing depths for ATAC-seq and DNase-seq projects. The optimized ATAC-seq and DNase1-seq analysis pipelines, parameters, and ground-truth ChIP-seq datasets have been made available for deployment and future algorithmic profiling. Electronic supplementary material The online version of this article (10.1186/s12864-018-4943-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Thomas J F Pranzatelli
- National Institute of Dental and Craniofacial Research, National Institutes of Health, 10 Center Drive, Bethesda, MD, 20816, USA
| | - Drew G Michael
- National Institute of Dental and Craniofacial Research, National Institutes of Health, 10 Center Drive, Bethesda, MD, 20816, USA
| | - John A Chiorini
- National Institute of Dental and Craniofacial Research, National Institutes of Health, 10 Center Drive, Bethesda, MD, 20816, USA.
| |
Collapse
|
44
|
Genome-scale identification of transcription factors that mediate an inflammatory network during breast cellular transformation. Nat Commun 2018; 9:2068. [PMID: 29802342 PMCID: PMC5970197 DOI: 10.1038/s41467-018-04406-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2017] [Accepted: 04/26/2018] [Indexed: 01/05/2023] Open
Abstract
Transient activation of Src oncoprotein in non-transformed, breast epithelial cells can initiate an epigenetic switch to the stably transformed state via a positive feedback loop that involves the inflammatory transcription factors STAT3 and NF-κB. Here, we develop an experimental and computational pipeline that includes 1) a Bayesian network model (AccessTF) that accurately predicts protein-bound DNA sequence motifs based on chromatin accessibility, and 2) a scoring system (TFScore) that rank-orders transcription factors as candidates for being important for a biological process. Genetic experiments validate TFScore and suggest that more than 40 transcription factors contribute to the oncogenic state in this model. Interestingly, individual depletion of several of these factors results in similar transcriptional profiles, indicating that a complex and interconnected transcriptional network promotes a stable oncogenic state. The combined experimental and computational pipeline represents a general approach to comprehensively identify transcriptional regulators important for a biological process.
Collapse
|
45
|
Zirkel A, Nikolic M, Sofiadis K, Mallm JP, Brackley CA, Gothe H, Drechsel O, Becker C, Altmüller J, Josipovic N, Georgomanolis T, Brant L, Franzen J, Koker M, Gusmao EG, Costa IG, Ullrich RT, Wagner W, Roukos V, Nürnberg P, Marenduzzo D, Rippe K, Papantonis A. HMGB2 Loss upon Senescence Entry Disrupts Genomic Organization and Induces CTCF Clustering across Cell Types. Mol Cell 2018; 70:730-744.e6. [PMID: 29706538 DOI: 10.1016/j.molcel.2018.03.030] [Citation(s) in RCA: 153] [Impact Index Per Article: 21.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Revised: 02/19/2018] [Accepted: 03/25/2018] [Indexed: 11/30/2022]
Abstract
Processes like cellular senescence are characterized by complex events giving rise to heterogeneous cell populations. However, the early molecular events driving this cascade remain elusive. We hypothesized that senescence entry is triggered by an early disruption of the cells' three-dimensional (3D) genome organization. To test this, we combined Hi-C, single-cell and population transcriptomics, imaging, and in silico modeling of three distinct cells types entering senescence. Genes involved in DNA conformation maintenance are suppressed upon senescence entry across all cell types. We show that nuclear depletion of the abundant HMGB2 protein occurs early on the path to senescence and coincides with the dramatic spatial clustering of CTCF. Knocking down HMGB2 suffices for senescence-induced CTCF clustering and for loop reshuffling, while ectopically expressing HMGB2 rescues these effects. Our data suggest that HMGB2-mediated genomic reorganization constitutes a primer for the ensuing senescent program.
Collapse
Affiliation(s)
- Anne Zirkel
- Center for Molecular Medicine Cologne, University of Cologne, 50931 Cologne, Germany
| | - Milos Nikolic
- Center for Molecular Medicine Cologne, University of Cologne, 50931 Cologne, Germany
| | - Konstantinos Sofiadis
- Center for Molecular Medicine Cologne, University of Cologne, 50931 Cologne, Germany
| | - Jan-Philipp Mallm
- German Cancer Research Center and Bioquant, 69120 Heidelberg, Germany
| | - Chris A Brackley
- School of Physics and Astronomy, University of Edinburgh, EH9 3FD Edinburgh, UK
| | - Henrike Gothe
- Institute of Molecular Biology, 55128 Mainz, Germany
| | | | - Christian Becker
- Cologne Center for Genomics, University of Cologne, 50931 Cologne, Germany
| | - Janine Altmüller
- Center for Molecular Medicine Cologne, University of Cologne, 50931 Cologne, Germany; Cologne Center for Genomics, University of Cologne, 50931 Cologne, Germany
| | - Natasa Josipovic
- Center for Molecular Medicine Cologne, University of Cologne, 50931 Cologne, Germany
| | | | - Lilija Brant
- Center for Molecular Medicine Cologne, University of Cologne, 50931 Cologne, Germany
| | - Julia Franzen
- Helmholtz Institute for Biomedical Engineering, RWTH Aachen University Medical School, 52074 Aachen, Germany
| | - Mirjam Koker
- Clinic I of Internal Medicine and Center for Integrated Oncology, University Hospital Cologne, 50931 Cologne, Germany
| | - Eduardo G Gusmao
- Center for Molecular Medicine Cologne, University of Cologne, 50931 Cologne, Germany; Interdisciplinary Centre for Clinical Research, RWTH Aachen University Medical School, 52062 Aachen, Germany
| | - Ivan G Costa
- Interdisciplinary Centre for Clinical Research, RWTH Aachen University Medical School, 52062 Aachen, Germany
| | - Roland T Ullrich
- Center for Molecular Medicine Cologne, University of Cologne, 50931 Cologne, Germany; Clinic I of Internal Medicine and Center for Integrated Oncology, University Hospital Cologne, 50931 Cologne, Germany
| | - Wolfgang Wagner
- Helmholtz Institute for Biomedical Engineering, RWTH Aachen University Medical School, 52074 Aachen, Germany
| | | | - Peter Nürnberg
- Center for Molecular Medicine Cologne, University of Cologne, 50931 Cologne, Germany; Cologne Center for Genomics, University of Cologne, 50931 Cologne, Germany; Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases (CECAD), University of Cologne, 50931 Cologne, Germany
| | - Davide Marenduzzo
- School of Physics and Astronomy, University of Edinburgh, EH9 3FD Edinburgh, UK
| | - Karsten Rippe
- German Cancer Research Center and Bioquant, 69120 Heidelberg, Germany
| | - Argyris Papantonis
- Center for Molecular Medicine Cologne, University of Cologne, 50931 Cologne, Germany.
| |
Collapse
|
46
|
Osato N. Characteristics of functional enrichment and gene expression level of human putative transcriptional target genes. BMC Genomics 2018; 19:957. [PMID: 29363429 PMCID: PMC5780744 DOI: 10.1186/s12864-017-4339-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Transcriptional target genes show functional enrichment of genes. However, how many and how significantly transcriptional target genes include functional enrichments are still unclear. To address these issues, I predicted human transcriptional target genes using open chromatin regions, ChIP-seq data and DNA binding sequences of transcription factors in databases, and examined functional enrichment and gene expression level of putative transcriptional target genes. RESULTS Gene Ontology annotations showed four times larger numbers of functional enrichments in putative transcriptional target genes than gene expression information alone, independent of transcriptional target genes. To compare the number of functional enrichments of putative transcriptional target genes between cells or search conditions, I normalized the number of functional enrichment by calculating its ratios in the total number of transcriptional target genes. With this analysis, native putative transcriptional target genes showed the largest normalized number of functional enrichments, compared with target genes including 5-60% of randomly selected genes. The normalized number of functional enrichments was changed according to the criteria of enhancer-promoter interactions such as distance from transcriptional start sites and orientation of CTCF-binding sites. Forward-reverse orientation of CTCF-binding sites showed significantly higher normalized number of functional enrichments than the other orientations. Journal papers showed that the top five frequent functional enrichments were related to the cellular functions in the three cell types. The median expression level of transcriptional target genes changed according to the criteria of enhancer-promoter assignments (i.e. interactions) and was correlated with the changes of the normalized number of functional enrichments of transcriptional target genes. CONCLUSIONS Human putative transcriptional target genes showed significant functional enrichments. Functional enrichments were related to the cellular functions. The normalized number of functional enrichments of human putative transcriptional target genes changed according to the criteria of enhancer-promoter assignments and correlated with the median expression level of the target genes. These analyses and characters of human putative transcriptional target genes would be useful to examine the criteria of enhancer-promoter assignments and to predict the novel mechanisms and factors such as DNA binding proteins and DNA sequences of enhancer-promoter interactions.
Collapse
Affiliation(s)
- Naoki Osato
- Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University, Osaka, 565-0871, Japan.
| |
Collapse
|
47
|
Fu H, LianpingYang, Zhang X. Noncoding Variants Functional Prioritization Methods Based on Predicted Regulatory Factor Binding Sites. Curr Genomics 2017; 18:322-331. [PMID: 29081688 PMCID: PMC5635616 DOI: 10.2174/1389202918666170228143619] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Revised: 10/16/2016] [Accepted: 11/02/2016] [Indexed: 12/31/2022] Open
Abstract
BACKGROUNDS With the advent of the post genomic era, the research for the genetic mechanism of the diseases has found to be increasingly depended on the studies of the genes, the gene-networks and gene-protein interaction networks. To explore gene expression and regulation, the researchers have carried out many studies on transcription factors and their binding sites (TFBSs). Based on the large amount of transcription factor binding sites predicting values in the deep learning models, further computation and analysis have been done to reveal the relationship between the gene mutation and the occurrence of the disease. It has been demonstrated that based on the deep learning methods, the performances of the prediction for the functions of the noncoding variants are outperforming than those of the conventional methods. The research on the prediction for functions of Single Nucleotide Polymorphisms (SNPs) is expected to uncover the mechanism of the gene mutation affection on traits and diseases of human beings. RESULTS We reviewed the conventional TFBSs identification methods from different perspectives. As for the deep learning methods to predict the TFBSs, we discussed the related problems, such as the raw data preprocessing, the structure design of the deep convolution neural network (CNN) and the model performance measure et al. And then we summarized the techniques that usually used in finding out the functional noncoding variants from de novo sequence. CONCLUSION Along with the rapid development of the high-throughout assays, more and more sample data and chromatin features would be conducive to improve the prediction accuracy of the deep convolution neural network for TFBSs identification. Meanwhile, getting more insights into the deep CNN framework itself has been proved useful for both the promotion on model performance and the development for more suitable design to sample data. Based on the feature values predicted by the deep CNN model, the prioritization model for functional noncoding variants would contribute to reveal the affection of gene mutation on the diseases.
Collapse
Affiliation(s)
- Haoyue Fu
- College of Sciences, Northeastern University, Shenyang, China
| | - LianpingYang
- College of Sciences, Northeastern University, Shenyang, China
- University of Southern California, Dept. Biol. Sci., Program Mol & Computat Biol, USA
| | - Xiangde Zhang
- College of Sciences, Northeastern University, Shenyang, China
| |
Collapse
|
48
|
Liu S, Zibetti C, Wan J, Wang G, Blackshaw S, Qian J. Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility. BMC Bioinformatics 2017; 18:355. [PMID: 28750606 PMCID: PMC5530957 DOI: 10.1186/s12859-017-1769-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2017] [Accepted: 07/19/2017] [Indexed: 12/04/2022] Open
Abstract
Background Computational prediction of transcription factor (TF) binding sites in different cell types is challenging. Recent technology development allows us to determine the genome-wide chromatin accessibility in various cellular and developmental contexts. The chromatin accessibility profiles provide useful information in prediction of TF binding events in various physiological conditions. Furthermore, ChIP-Seq analysis was used to determine genome-wide binding sites for a range of different TFs in multiple cell types. Integration of these two types of genomic information can improve the prediction of TF binding events. Results We assessed to what extent a model built upon on other TFs and/or other cell types could be used to predict the binding sites of TFs of interest. A random forest model was built using a set of cell type-independent features such as specific sequences recognized by the TFs and evolutionary conservation, as well as cell type-specific features derived from chromatin accessibility data. Our analysis suggested that the models learned from other TFs and/or cell lines performed almost as well as the model learned from the target TF in the cell type of interest. Interestingly, models based on multiple TFs performed better than single-TF models. Finally, we proposed a universal model, BPAC, which was generated using ChIP-Seq data from multiple TFs in various cell types. Conclusion Integrating chromatin accessibility information with sequence information improves prediction of TF binding.The prediction of TF binding is transferable across TFs and/or cell lines suggesting there are a set of universal “rules”. A computational tool was developed to predict TF binding sites based on the universal “rules”. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1769-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sheng Liu
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA
| | - Cristina Zibetti
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA
| | - Jun Wan
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA
| | - Guohua Wang
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA
| | - Seth Blackshaw
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA.,Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA.,Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA.,Centre for Human Systems Biology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA.,Institute for Cell Engineering, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA
| | - Jiang Qian
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA.
| |
Collapse
|
49
|
Kehl T, Schneider L, Schmidt F, Stöckel D, Gerstner N, Backes C, Meese E, Keller A, Schulz MH, Lenhof HP. RegulatorTrail: a web service for the identification of key transcriptional regulators. Nucleic Acids Res 2017; 45:W146-W153. [PMID: 28472408 PMCID: PMC5570139 DOI: 10.1093/nar/gkx350] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Revised: 04/07/2017] [Accepted: 04/20/2017] [Indexed: 12/14/2022] Open
Abstract
Transcriptional regulators such as transcription factors and chromatin modifiers play a central role in most biological processes. Alterations in their activities have been observed in many diseases, e.g. cancer. Hence, it is of utmost importance to evaluate and assess the effects of transcriptional regulators on natural and pathogenic processes. Here, we present RegulatorTrail, a web service that provides rich functionality for the identification and prioritization of key transcriptional regulators that have a strong impact on, e.g. pathological processes. RegulatorTrail offers eight methods that use regulator binding information in combination with transcriptomic or epigenomic data to infer the most influential regulators. Our web service not only provides an intuitive web interface, but also a well-documented RESTful API that allows for a straightforward integration into third-party workflows. The presented case studies highlight the capabilities of our web service and demonstrate its potential for the identification of influential regulators: we successfully identified regulators that might explain the increased malignancy in metastatic melanoma compared to primary tumors, as well as important regulators in macrophages. RegulatorTrail is freely accessible at: https://regulatortrail.bioinf.uni-sb.de/.
Collapse
Affiliation(s)
- Tim Kehl
- Center for Bioinformatics, Saarland Informatics Campus, Saarland University, 66123 Saarbrücken, Germany
| | - Lara Schneider
- Center for Bioinformatics, Saarland Informatics Campus, Saarland University, 66123 Saarbrücken, Germany
| | - Florian Schmidt
- Center for Bioinformatics, Saarland Informatics Campus, Saarland University, 66123 Saarbrücken, Germany
- Cluster of Excellence Multimodal Computing and Interaction, Saarland Informatics Campus, 66123 Saarland University, Saarbrücken, Germany
- Max Planck Institute for Informatics, Saarland Informatics Campus, 66123 Saarbrücken, Germany
| | - Daniel Stöckel
- Center for Bioinformatics, Saarland Informatics Campus, Saarland University, 66123 Saarbrücken, Germany
| | - Nico Gerstner
- Center for Bioinformatics, Saarland Informatics Campus, Saarland University, 66123 Saarbrücken, Germany
| | - Christina Backes
- Center for Bioinformatics, Saarland Informatics Campus, Saarland University, 66123 Saarbrücken, Germany
| | - Eckart Meese
- Center for Bioinformatics, Saarland Informatics Campus, Saarland University, 66123 Saarbrücken, Germany
- Human Genetics, Saarland University, 66421 Homburg, Germany
| | - Andreas Keller
- Center for Bioinformatics, Saarland Informatics Campus, Saarland University, 66123 Saarbrücken, Germany
| | - Marcel H Schulz
- Center for Bioinformatics, Saarland Informatics Campus, Saarland University, 66123 Saarbrücken, Germany
- Cluster of Excellence Multimodal Computing and Interaction, Saarland Informatics Campus, 66123 Saarland University, Saarbrücken, Germany
- Max Planck Institute for Informatics, Saarland Informatics Campus, 66123 Saarbrücken, Germany
| | - Hans-Peter Lenhof
- Center for Bioinformatics, Saarland Informatics Campus, Saarland University, 66123 Saarbrücken, Germany
| |
Collapse
|
50
|
Chen X, Yu B, Carriero N, Silva C, Bonneau R. Mocap: large-scale inference of transcription factor binding sites from chromatin accessibility. Nucleic Acids Res 2017; 45:4315-4329. [PMID: 28334916 PMCID: PMC5416775 DOI: 10.1093/nar/gkx174] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2016] [Revised: 02/28/2017] [Accepted: 03/06/2017] [Indexed: 12/21/2022] Open
Abstract
Differential binding of transcription factors (TFs) at cis-regulatory loci drives the differentiation and function of diverse cellular lineages. Understanding the regulatory interactions that underlie cell fate decisions requires characterizing TF binding sites (TFBS) across multiple cell types and conditions. Techniques, e.g. ChIP-Seq can reveal genome-wide patterns of TF binding, but typically requires laborious and costly experiments for each TF-cell-type (TFCT) condition of interest. Chromosomal accessibility assays can connect accessible chromatin in one cell type to many TFs through sequence motif mapping. Such methods, however, rarely take into account that the genomic context preferred by each factor differs from TF to TF, and from cell type to cell type. To address the differences in TF behaviors, we developed Mocap, a method that integrates chromatin accessibility, motif scores, TF footprints, CpG/GC content, evolutionary conservation and other factors in an ensemble of TFCT-specific classifiers. We show that integration of genomic features, such as CpG islands improves TFBS prediction in some TFCT. Further, we describe a method for mapping new TFCT, for which no ChIP-seq data exists, onto our ensemble of classifiers and show that our cross-sample TFBS prediction method outperforms several previously described methods.
Collapse
Affiliation(s)
- Xi Chen
- Department of Biology, New York University, New York, NY 10003, USA
| | - Bowen Yu
- Department of Computer Science, New York University, New York, NY 10003, USA
| | - Nicholas Carriero
- Center for Computational Biology, Flatiron Foundation, Simons Foundation, New York, NY 10010, USA
| | - Claudio Silva
- Department of Computer Science, New York University, New York, NY 10003, USA
| | - Richard Bonneau
- Department of Biology, New York University, New York, NY 10003, USA
- Department of Computer Science, New York University, New York, NY 10003, USA
- Center for Computational Biology, Flatiron Foundation, Simons Foundation, New York, NY 10010, USA
| |
Collapse
|