1
|
Sathian R, Dutta P, Ay F, Davuluri RV. Genomic Language Model for Predicting Enhancers and Their Allele-Specific Activity in the Human Genome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.18.644040. [PMID: 40166250 PMCID: PMC11957021 DOI: 10.1101/2025.03.18.644040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Predicting and deciphering the regulatory logic of enhancers is a challenging problem, due to the intricate sequence features and lack of consistent genetic or epigenetic signatures that can accurately discriminate enhancers from other genomic regions. Recent machine-learning based methods have spotlighted the importance of extracting nucleotide composition of enhancers but failed to learn the sequence context and perform suboptimally. Motivated by advances in genomic language models, we developed DNABERT-Enhancer, a novel enhancer prediction method, by applying DNABERT pre-trained language model on the human genome. We trained two different models, using large collection of enhancers curated from the ENCODE registry of candidate cis-Regulatory Elements. The best fine-tuned model achieved 88.05% accuracy with Matthews correlation coefficient of 76% on independent set aside data. Further, we present the analysis of the predicted enhancers for all chromosomes of the human genome by comparing with the enhancer regions reported in publicly available databases. Finally, we applied DNABERT-Enhancer along with other DNABERT based regulatory genomic region prediction models to predict candidate SNPs with allele-specific enhancer and transcription factor binding activity. The genome-wide enhancer annotations and candidate loss-of-function genetic variants predicted by DNABERT-Enhancer provide valuable resources for genome interpretation in functional and clinical genomics studies.
Collapse
|
2
|
Mulero-Hernández J, Mironov V, Miñarro-Giménez JA, Kuiper M, Fernández-Breis J. Integration of chromosome locations and functional aspects of enhancers and topologically associating domains in knowledge graphs enables versatile queries about gene regulation. Nucleic Acids Res 2024; 52:e69. [PMID: 38967009 PMCID: PMC11347148 DOI: 10.1093/nar/gkae566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 06/12/2024] [Accepted: 06/19/2024] [Indexed: 07/06/2024] Open
Abstract
Knowledge about transcription factor binding and regulation, target genes, cis-regulatory modules and topologically associating domains is not only defined by functional associations like biological processes or diseases but also has a determinative genome location aspect. Here, we exploit these location and functional aspects together to develop new strategies to enable advanced data querying. Many databases have been developed to provide information about enhancers, but a schema that allows the standardized representation of data, securing interoperability between resources, has been lacking. In this work, we use knowledge graphs for the standardized representation of enhancers and topologically associating domains, together with data about their target genes, transcription factors, location on the human genome, and functional data about diseases and gene ontology annotations. We used this schema to integrate twenty-five enhancer datasets and two domain datasets, creating the most powerful integrative resource in this field to date. The knowledge graphs have been implemented using the Resource Description Framework and integrated within the open-access BioGateway knowledge network, generating a resource that contains an interoperable set of knowledge graphs (enhancers, TADs, genes, proteins, diseases, GO terms, and interactions between domains). We show how advanced queries, which combine functional and location restrictions, can be used to develop new hypotheses about functional aspects of gene expression regulation.
Collapse
Affiliation(s)
- Juan Mulero-Hernández
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, Instituto Murciano de Investigación Biosanitaria (IMIB),30100 Murcia, Spain
| | - Vladimir Mironov
- Department of Biology, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway
| | - José Antonio Miñarro-Giménez
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, Instituto Murciano de Investigación Biosanitaria (IMIB),30100 Murcia, Spain
| | - Martin Kuiper
- Department of Biology, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway
| | - Jesualdo Tomás Fernández-Breis
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, Instituto Murciano de Investigación Biosanitaria (IMIB),30100 Murcia, Spain
| |
Collapse
|
3
|
Zhu Z, Zhou Q, Sun Y, Lai F, Wang Z, Hao Z, Li G. MethMarkerDB: a comprehensive cancer DNA methylation biomarker database. Nucleic Acids Res 2024; 52:D1380-D1392. [PMID: 37889076 PMCID: PMC10767949 DOI: 10.1093/nar/gkad923] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/21/2023] [Accepted: 10/10/2023] [Indexed: 10/28/2023] Open
Abstract
DNA methylation plays a crucial role in tumorigenesis and tumor progression, sparking substantial interest in the clinical applications of cancer DNA methylation biomarkers. Cancer-related whole-genome bisulfite sequencing (WGBS) data offers a promising approach to precisely identify these biomarkers with differentially methylated regions (DMRs). However, currently there is no dedicated resource for cancer DNA methylation biomarkers with WGBS data. Here, we developed a comprehensive cancer DNA methylation biomarker database (MethMarkerDB, https://methmarkerdb.hzau.edu.cn/), which integrated 658 WGBS datasets, incorporating 724 curated DNA methylation biomarker genes from 1425 PubMed published articles. Based on WGBS data, we documented 5.4 million DMRs from 13 common types of cancer as candidate DNA methylation biomarkers. We provided search and annotation functions for these DMRs with different resources, such as enhancers and SNPs, and developed diagnostic and prognostic models for further biomarker evaluation. With the database, we not only identified known DNA methylation biomarkers, but also identified 781 hypermethylated and 5245 hypomethylated pan-cancer DMRs, corresponding to 693 and 2172 genes, respectively. These novel potential pan-cancer DNA methylation biomarkers hold significant clinical translational value. We hope that MethMarkerDB will help identify novel cancer DNA methylation biomarkers and propel the clinical application of these biomarkers.
Collapse
Affiliation(s)
- Zhixian Zhu
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Qiangwei Zhou
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Yuanhui Sun
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Fuming Lai
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zhenji Wang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zhigang Hao
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Guoliang Li
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
4
|
Wang Q, Zhang J, Liu Z, Duan Y, Li C. Integrative approaches based on genomic techniques in the functional studies on enhancers. Brief Bioinform 2023; 25:bbad442. [PMID: 38048082 PMCID: PMC10694556 DOI: 10.1093/bib/bbad442] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 10/22/2023] [Accepted: 11/08/2023] [Indexed: 12/05/2023] Open
Abstract
With the development of sequencing technology and the dramatic drop in sequencing cost, the functions of noncoding genes are being characterized in a wide variety of fields (e.g. biomedicine). Enhancers are noncoding DNA elements with vital transcription regulation functions. Tens of thousands of enhancers have been identified in the human genome; however, the location, function, target genes and regulatory mechanisms of most enhancers have not been elucidated thus far. As high-throughput sequencing techniques have leapt forwards, omics approaches have been extensively employed in enhancer research. Multidimensional genomic data integration enables the full exploration of the data and provides novel perspectives for screening, identification and characterization of the function and regulatory mechanisms of unknown enhancers. However, multidimensional genomic data are still difficult to integrate genome wide due to complex varieties, massive amounts, high rarity, etc. To facilitate the appropriate methods for studying enhancers with high efficacy, we delineate the principles, data processing modes and progress of various omics approaches to study enhancers and summarize the applications of traditional machine learning and deep learning in multi-omics integration in the enhancer field. In addition, the challenges encountered during the integration of multiple omics data are addressed. Overall, this review provides a comprehensive foundation for enhancer analysis.
Collapse
Affiliation(s)
- Qilin Wang
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Junyou Zhang
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Zhaoshuo Liu
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Yingying Duan
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Chunyan Li
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
- Key Laboratory of Big Data-Based Precision Medicine (Ministry of Industry and Information Technology), Beihang University, Beijing 100191, China
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Beihang University, Beijing 100191, China
| |
Collapse
|
5
|
Phan LT, Oh C, He T, Manavalan B. A comprehensive revisit of the machine-learning tools developed for the identification of enhancers in the human genome. Proteomics 2023; 23:e2200409. [PMID: 37021401 DOI: 10.1002/pmic.202200409] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 03/18/2023] [Accepted: 03/27/2023] [Indexed: 04/07/2023]
Abstract
Enhancers are non-coding DNA elements that play a crucial role in enhancing the transcription rate of a specific gene in the genome. Experiments for identifying enhancers can be restricted by their conditions and involve complicated, time-consuming, laborious, and costly steps. To overcome these challenges, computational platforms have been developed to complement experimental methods that enable high-throughput identification of enhancers. Over the last few years, the development of various enhancer computational tools has resulted in significant progress in predicting putative enhancers. Thus, researchers are now able to use a variety of strategies to enhance and advance enhancer study. In this review, an overview of machine learning (ML)-based prediction methods for enhancer identification and related databases has been provided. The existing enhancer-prediction methods have also been reviewed regarding their algorithms, feature selection processes, validation techniques, and software utility. In addition, the advantages and drawbacks of these ML approaches and guidelines for developing bioinformatic tools have been highlighted for a more efficient enhancer prediction. This review will serve as a useful resource for experimentalists in selecting the appropriate ML tool for their study, and for bioinformaticians in developing more accurate and advanced ML-based predictors.
Collapse
Affiliation(s)
- Le Thi Phan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do, South Korea
| | - Changmin Oh
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do, South Korea
| | - Tao He
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do, South Korea
| |
Collapse
|
6
|
Zhou Q, Cheng S, Zheng S, Wang Z, Guan P, Zhu Z, Huang X, Zhou C, Li G. ChromLoops: a comprehensive database for specific protein-mediated chromatin loops in diverse organisms. Nucleic Acids Res 2023; 51:D57-D69. [PMID: 36243984 PMCID: PMC9825580 DOI: 10.1093/nar/gkac893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 09/14/2022] [Accepted: 10/03/2022] [Indexed: 01/29/2023] Open
Abstract
Chromatin loops (or chromatin interactions) are important elements of chromatin structures. Disruption of chromatin loops is associated with many diseases, such as cancer and polydactyly. A few methods, including ChIA-PET, HiChIP and PLAC-Seq, have been proposed to detect high-resolution, specific protein-mediated chromatin loops. With rapid progress in 3D genomic research, ChIA-PET, HiChIP and PLAC-Seq datasets continue to accumulate, and effective collection and processing for these datasets are urgently needed. Here, we developed a comprehensive, multispecies and specific protein-mediated chromatin loop database (ChromLoops, https://3dgenomics.hzau.edu.cn/chromloops), which integrated 1030 ChIA-PET, HiChIP and PLAC-Seq datasets from 13 species, and documented 1 491 416 813 high-quality chromatin loops. We annotated genes and regions overlapping with chromatin loop anchors with rich functional annotations, such as regulatory elements (enhancers, super-enhancers and silencers), variations (common SNPs, somatic SNPs and eQTLs), and transcription factor binding sites. Moreover, we identified genes with high-frequency chromatin interactions in the collected species. In particular, we identified genes with high-frequency interactions in cancer samples. We hope that ChromLoops will provide a new platform for studying chromatin interaction regulation in relation to biological processes and disease.
Collapse
Affiliation(s)
- Qiangwei Zhou
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Sheng Cheng
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Shanshan Zheng
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zhenji Wang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Pengpeng Guan
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zhixian Zhu
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xingyu Huang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Cong Zhou
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Guoliang Li
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
7
|
Panahi-Moghadam S, Hassani S, Farivar S, Vakhshiteh F. Emerging Role of Enhancer RNAs as Potential Diagnostic and Prognostic Biomarkers in Cancer. Noncoding RNA 2022; 8:ncrna8050066. [PMID: 36287118 PMCID: PMC9607539 DOI: 10.3390/ncrna8050066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 09/28/2022] [Accepted: 09/29/2022] [Indexed: 11/05/2022] Open
Abstract
Enhancers are distal cis-acting elements that are commonly recognized to regulate gene expression via cooperation with promoters. Along with regulating gene expression, enhancers can be transcribed and generate a class of non-coding RNAs called enhancer RNAs (eRNAs). The current discovery of abundant tissue-specific transcription of enhancers in various diseases such as cancers raises questions about the potential role of eRNAs in disease diagnosis and therapy. This review aimed to demonstrate the current understanding of eRNAs in cancer research with a focus on the potential roles of eRNAs as prognostic and diagnostic biomarkers in cancers.
Collapse
Affiliation(s)
- Somayeh Panahi-Moghadam
- Department of Genetics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran 1411713116, Iran
- Department of Cell and Molecular Biology, Faculty of Life Sciences and Biotechnology, Shahid Beheshti University, Tehran 1983969411, Iran
| | - Shokoufeh Hassani
- Department of Toxicology and Pharmacology, Faculty of Pharmacy, Tehran University of Medical Sciences, Tehran 1417614411, Iran
- Toxicology and Diseases Group, Pharmaceutical Sciences Research Center (PSRC), The Institute of Pharmaceutical Sciences (TIPS), Tehran University of Medical Sciences (TUMS), Tehran 1417614411, Iran
| | - Shirin Farivar
- Department of Cell and Molecular Biology, Faculty of Life Sciences and Biotechnology, Shahid Beheshti University, Tehran 1983969411, Iran
| | - Faezeh Vakhshiteh
- Oncopathology Research Center, Iran University of Medical Sciences (IUMS), Tehran 1449614535, Iran
- Correspondence:
| |
Collapse
|
8
|
Mulero Hernández J, Fernández-Breis JT. Analysis of the landscape of human enhancer sequences in biological databases. Comput Struct Biotechnol J 2022; 20:2728-2744. [PMID: 35685360 PMCID: PMC9168495 DOI: 10.1016/j.csbj.2022.05.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 05/20/2022] [Accepted: 05/21/2022] [Indexed: 12/01/2022] Open
Abstract
The process of gene regulation extends as a network in which both genetic sequences and proteins are involved. The levels of regulation and the mechanisms involved are multiple. Transcription is the main control mechanism for most genes, being the downstream steps responsible for refining the transcription patterns. In turn, gene transcription is mainly controlled by regulatory events that occur at promoters and enhancers. Several studies are focused on analyzing the contribution of enhancers in the development of diseases and their possible use as therapeutic targets. The study of regulatory elements has advanced rapidly in recent years with the development and use of next generation sequencing techniques. All this information has generated a large volume of information that has been transferred to a growing number of public repositories that store this information. In this article, we analyze the content of those public repositories that contain information about human enhancers with the aim of detecting whether the knowledge generated by scientific research is contained in those databases in a way that could be computationally exploited. The analysis will be based on three main aspects identified in the literature: types of enhancers, type of evidence about the enhancers, and methods for detecting enhancer-promoter interactions. Our results show that no single database facilitates the optimal exploitation of enhancer data, most types of enhancers are not represented in the databases and there is need for a standardized model for enhancers. We have identified major gaps and challenges for the computational exploitation of enhancer data.
Collapse
Affiliation(s)
- Juan Mulero Hernández
- Dept. Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, IMIB-Arrixaca, Spain
| | | |
Collapse
|
9
|
Ni P, Su Z. PCRMS: a database of predicted cis-regulatory modules and constituent transcription factor binding sites in genomes. Database (Oxford) 2022; 2022:6572594. [PMID: 35452518 PMCID: PMC9216522 DOI: 10.1093/database/baac024] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Revised: 02/20/2022] [Accepted: 04/12/2022] [Indexed: 01/13/2023]
Abstract
More accurate and more complete predictions of cis-regulatory modules (CRMs) and constituent transcription factor (TF) binding sites (TFBSs) in genomes can facilitate characterizing functions of regulatory sequences. Here, we developed a database predicted cis-regulatory modules (PCRMS) (https://cci-bioinfo.uncc.edu) that stores highly accurate and unprecedentedly complete maps of predicted CRMs and TFBSs in the human and mouse genomes. The web interface allows the user to browse CRMs and TFBSs in an organism, find the closest CRMs to a gene, search CRMs around a gene and find all TFBSs of a TF. PCRMS can be a useful resource for the research community to characterize regulatory genomes. Database URL: https://cci-bioinfo.uncc.edu/.
Collapse
Affiliation(s)
- Pengyu Ni
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, 9201 University City Boulevard, Charlotte, NC 28223, USA
| | - Zhengchang Su
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, 9201 University City Boulevard, Charlotte, NC 28223, USA
| |
Collapse
|
10
|
Gao T, Zheng Z, Pan Y, Zhu C, Wei F, Yuan J, Sun R, Fang S, Wang N, Zhou Y, Qian J. scEnhancer: a single-cell enhancer resource with annotation across hundreds of tissue/cell types in three species. Nucleic Acids Res 2021; 50:D371-D379. [PMID: 34761274 PMCID: PMC8728125 DOI: 10.1093/nar/gkab1032] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Revised: 10/04/2021] [Accepted: 10/19/2021] [Indexed: 12/14/2022] Open
Abstract
Previous studies on enhancers and their target genes were largely based on bulk samples that represent ‘average’ regulatory activities from a large population of millions of cells, masking the heterogeneity and important effects from the sub-populations. In recent years, single-cell sequencing technology has enabled the profiling of open chromatin accessibility at the single-cell level (scATAC-seq), which can be used to annotate the enhancers and promoters in specific cell types. A comprehensive resource is highly desirable for exploring how the enhancers regulate the target genes at the single-cell level. Hence, we designed a single-cell database scEnhancer (http://enhanceratlas.net/scenhancer/), covering 14 527 776 enhancers and 63 658 600 enhancer-gene interactions from 1 196 906 single cells across 775 tissue/cell types in three species. An unsupervised learning method was employed to sort and combine tens or hundreds of single cells in each tissue/cell type to obtain the consensus enhancers. In addition, we utilized a cis-regulatory network algorithm to identify the enhancer-gene connections. Finally, we provided a user-friendly platform with seven useful modules to search, visualize, and browse the enhancers/genes. This database will facilitate the research community towards a functional analysis of enhancers at the single-cell level.
Collapse
Affiliation(s)
- Tianshun Gao
- Big Data Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China.,Scientific Research Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China
| | - Zilong Zheng
- Big Data Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China
| | - Yihang Pan
- Big Data Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China.,Scientific Research Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China
| | - Chengming Zhu
- Scientific Research Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China
| | - Fuxin Wei
- Department of Orthopaedics, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China
| | - Jinqiu Yuan
- Big Data Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China.,Scientific Research Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China
| | - Rui Sun
- Big Data Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China.,Scientific Research Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China
| | - Shuo Fang
- Big Data Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China.,Department of Oncology, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China
| | - Nan Wang
- Scientific Research Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China
| | - Yang Zhou
- Big Data Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China
| | - Jiang Qian
- The Wilmer Eye Institute, Johns Hopkins School of Medicine, Baltimore, MD 21231, USA.,The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA
| |
Collapse
|
11
|
Baumgarten N, Schmidt F, Wegner M, Hebel M, Kaulich M, Schulz MH. Computational prediction of CRISPR-impaired non-coding regulatory regions. Biol Chem 2021; 402:973-982. [PMID: 33660495 DOI: 10.1515/hsz-2020-0392] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 02/18/2021] [Indexed: 12/14/2022]
Abstract
Genome-wide CRISPR screens are becoming more widespread and allow the simultaneous interrogation of thousands of genomic regions. Although recent progress has been made in the analysis of CRISPR screens, it is still an open problem how to interpret CRISPR mutations in non-coding regions of the genome. Most of the tools concentrate on the interpretation of mutations introduced in gene coding regions. We introduce a computational pipeline that uses epigenomic information about regulatory elements for the interpretation of CRISPR mutations in non-coding regions. We illustrate our analysis protocol on the analysis of a genome-wide CRISPR screen in hTERT-RPE1 cells and reveal novel regulatory elements that mediate chemoresistance against doxorubicin in these cells. We infer links to established and to novel chemoresistance genes. Our analysis protocol is general and can be applied on any cell type and with different CRISPR enzymes.
Collapse
Affiliation(s)
- Nina Baumgarten
- Institute for Cardiovascular Regeneration, Goethe University, 60590 Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, 60590 Frankfurt am Main, Germany
- Cluster of Excellence MMCI, Saarland University, and Max Planck Institute for Informatics, Saarland Informatics Campus, 66123 Saarbrücken, Germany
- Cardiopulmonary Institute (CPI), Goethe University, 60590 Frankfurt am Main, Germany
| | - Florian Schmidt
- Institute for Cardiovascular Regeneration, Goethe University, 60590 Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, 60590 Frankfurt am Main, Germany
- Cluster of Excellence MMCI, Saarland University, and Max Planck Institute for Informatics, Saarland Informatics Campus, 66123 Saarbrücken, Germany
- Laboratory of Systems Biology and Data Analytics, Genome Institute of Singapore, 60 Biopolis Street, 138672, Singapore, Singapore
| | - Martin Wegner
- Institute of Biochemistry II, Goethe University - Medical Faculty, University Hospital, 60590 Frankfurt am Main, Germany
- Frankfurt Cancer Institute, Goethe University, 60590 Frankfurt am Main, Germany
| | - Marie Hebel
- Institute of Biochemistry II, Goethe University - Medical Faculty, University Hospital, 60590 Frankfurt am Main, Germany
- Frankfurt Cancer Institute, Goethe University, 60590 Frankfurt am Main, Germany
| | - Manuel Kaulich
- Institute of Biochemistry II, Goethe University - Medical Faculty, University Hospital, 60590 Frankfurt am Main, Germany
- Frankfurt Cancer Institute, Goethe University, 60590 Frankfurt am Main, Germany
| | - Marcel H Schulz
- Institute for Cardiovascular Regeneration, Goethe University, 60590 Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, 60590 Frankfurt am Main, Germany
- Cluster of Excellence MMCI, Saarland University, and Max Planck Institute for Informatics, Saarland Informatics Campus, 66123 Saarbrücken, Germany
- Cardiopulmonary Institute (CPI), Goethe University, 60590 Frankfurt am Main, Germany
| |
Collapse
|
12
|
Baumgarten N, Hecker D, Karunanithi S, Schmidt F, List M, Schulz MH. EpiRegio: analysis and retrieval of regulatory elements linked to genes. Nucleic Acids Res 2020; 48:W193-W199. [PMID: 32459338 PMCID: PMC7319550 DOI: 10.1093/nar/gkaa382] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 04/21/2020] [Accepted: 05/04/2020] [Indexed: 12/26/2022] Open
Abstract
A current challenge in genomics is to interpret non-coding regions and their role in transcriptional regulation of possibly distant target genes. Genome-wide association studies show that a large part of genomic variants are found in those non-coding regions, but their mechanisms of gene regulation are often unknown. An additional challenge is to reliably identify the target genes of the regulatory regions, which is an essential step in understanding their impact on gene expression. Here we present the EpiRegio web server, a resource of regulatory elements (REMs). REMs are genomic regions that exhibit variations in their chromatin accessibility profile associated with changes in expression of their target genes. EpiRegio incorporates both epigenomic and gene expression data for various human primary cell types and tissues, providing an integrated view of REMs in the genome. Our web server allows the analysis of genes and their associated REMs, including the REM's activity and its estimated cell type-specific contribution to its target gene's expression. Further, it is possible to explore genomic regions for their regulatory potential, investigate overlapping REMs and by that the dissection of regions of large epigenomic complexity. EpiRegio allows programmatic access through a REST API and is freely available at https://epiregio.de/.
Collapse
Affiliation(s)
- Nina Baumgarten
- Institute for Cardiovascular Regeneration, Goethe University Hospital, 60590 Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590 Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, 60590 Frankfurt am Main, Germany
- Cluster of Excellence, Multimodal Computing and Interaction, Saarland Informatics Campus, 66123 Saarbrücken, Germany
| | - Dennis Hecker
- Institute for Cardiovascular Regeneration, Goethe University Hospital, 60590 Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590 Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, 60590 Frankfurt am Main, Germany
| | - Sivarajan Karunanithi
- Institute for Cardiovascular Regeneration, Goethe University Hospital, 60590 Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590 Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, 60590 Frankfurt am Main, Germany
| | - Florian Schmidt
- German Center for Cardiovascular Research, Partner site Rhein-Main, 60590 Frankfurt am Main, Germany
- Cluster of Excellence, Multimodal Computing and Interaction, Saarland Informatics Campus, 66123 Saarbrücken, Germany
- Genome Institute of Singapore, 60 Biopolis Street, Genome, 02-01, 138672, Singapore
| | - Markus List
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof-Forum 3, 85354 Freising, Germany
| | - Marcel H Schulz
- Institute for Cardiovascular Regeneration, Goethe University Hospital, 60590 Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590 Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, 60590 Frankfurt am Main, Germany
- Cluster of Excellence, Multimodal Computing and Interaction, Saarland Informatics Campus, 66123 Saarbrücken, Germany
| |
Collapse
|
13
|
Gao T, Qian J. EnhancerAtlas 2.0: an updated resource with enhancer annotation in 586 tissue/cell types across nine species. Nucleic Acids Res 2020; 48:D58-D64. [PMID: 31740966 PMCID: PMC7145677 DOI: 10.1093/nar/gkz980] [Citation(s) in RCA: 130] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Revised: 10/02/2019] [Accepted: 10/31/2019] [Indexed: 02/06/2023] Open
Abstract
Enhancers are distal cis-regulatory elements that activate the transcription of their target genes. They regulate a wide range of important biological functions and processes, including embryogenesis, development, and homeostasis. As more and more large-scale technologies were developed for enhancer identification, a comprehensive database is highly desirable for enhancer annotation based on various genome-wide profiling datasets across different species. Here, we present an updated database EnhancerAtlas 2.0 (http://www.enhanceratlas.org/indexv2.php), covering 586 tissue/cell types that include a large number of normal tissues, cancer cell lines, and cells at different development stages across nine species. Overall, the database contains 13 494 603 enhancers, which were obtained from 16 055 datasets using 12 high-throughput experiment methods (e.g. H3K4me1/H3K27ac, DNase-seq/ATAC-seq, P300, POLR2A, CAGE, ChIA-PET, GRO-seq, STARR-seq and MPRA). The updated version is a huge expansion of the first version, which only contains the enhancers in human cells. In addition, we predicted enhancer–target gene relationships in human, mouse and fly. Finally, the users can search enhancers and enhancer–target gene relationships through five user-friendly, interactive modules. We believe the new annotation of enhancers in EnhancerAtlas 2.0 will facilitate users to perform useful functional analysis of enhancers in various genomes.
Collapse
Affiliation(s)
- Tianshun Gao
- The Wilmer Eye Institute, Johns Hopkins School of Medicine, Baltimore, MD 21231, USA
| | - Jiang Qian
- The Wilmer Eye Institute, Johns Hopkins School of Medicine, Baltimore, MD 21231, USA.,The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA
| |
Collapse
|
14
|
Bai X, Shi S, Ai B, Jiang Y, Liu Y, Han X, Xu M, Pan Q, Wang F, Wang Q, Zhang J, Li X, Feng C, Li Y, Wang Y, Song Y, Feng K, Li C. ENdb: a manually curated database of experimentally supported enhancers for human and mouse. Nucleic Acids Res 2020; 48:D51-D57. [PMID: 31665430 PMCID: PMC7145688 DOI: 10.1093/nar/gkz973] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Revised: 10/10/2019] [Accepted: 10/16/2019] [Indexed: 12/30/2022] Open
Abstract
Enhancers are a class of cis-regulatory elements that can increase gene transcription by forming loops in intergenic regions, introns and exons. Enhancers, as well as their associated target genes, and transcription factors (TFs) that bind to them, are highly associated with human disease and biological processes. Although some enhancer databases have been published, most only focus on enhancers identified by high-throughput experimental techniques. Therefore, it is highly desirable to construct a comprehensive resource of manually curated enhancers and their related information based on low-throughput experimental evidences. Here, we established a comprehensive manually-curated enhancer database for human and mouse, which provides a resource for experimentally supported enhancers, and to annotate the detailed information of enhancers. The current release of ENdb documents 737 experimentally validated enhancers and their related information, including 384 target genes, 263 TFs, 110 diseases and 153 functions in human and mouse. Moreover, the enhancer-related information was supported by experimental evidences, such as RNAi, in vitro knockdown, western blotting, qRT-PCR, luciferase reporter assay, chromatin conformation capture (3C) and chromosome conformation capture-on-chip (4C) assays. ENdb provides a user-friendly interface to query, browse and visualize the detailed information of enhancers. The database is available at http://www.licpathway.net/ENdb.
Collapse
Affiliation(s)
- Xuefeng Bai
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Shanshan Shi
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Bo Ai
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Yong Jiang
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Yuejuan Liu
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Xiaole Han
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Mingcong Xu
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Qi Pan
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Fan Wang
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Qiuyu Wang
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Jian Zhang
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Xuecang Li
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Chenchen Feng
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Yanyu Li
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Yuezhu Wang
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Yiwei Song
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Ke Feng
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Chunquan Li
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| |
Collapse
|
15
|
Hariprakash JM, Ferrari F. Computational Biology Solutions to Identify Enhancers-target Gene Pairs. Comput Struct Biotechnol J 2019; 17:821-831. [PMID: 31316726 PMCID: PMC6611831 DOI: 10.1016/j.csbj.2019.06.012] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Revised: 06/04/2019] [Accepted: 06/11/2019] [Indexed: 12/12/2022] Open
Abstract
Enhancers are non-coding regulatory elements that are distant from their target gene. Their characterization still remains elusive especially due to challenges in achieving a comprehensive pairing of enhancers and target genes. A number of computational biology solutions have been proposed to address this problem leveraging the increasing availability of functional genomics data and the improved mechanistic understanding of enhancer action. In this review we focus on computational methods for genome-wide definition of enhancer-target gene pairs. We outline the different classes of methods, as well as their main advantages and limitations. The types of information integrated by each method, along with details on their applicability are presented and discussed. We especially highlight the technical challenges that are still unresolved and hamper the effective achievement of a satisfactory and comprehensive solution. We expect this field will keep evolving in the coming years due to the ever-growing availability of data and increasing insights into enhancers crucial role in regulating genome functionality.
Collapse
Affiliation(s)
| | - Francesco Ferrari
- IFOM, The FIRC Institute of Molecular Oncology, Milan, Italy
- Institute of Molecular Genetics, National Research Council, Pavia, Italy
| |
Collapse
|