1
|
Zhang W, Zhang M, Zhu M. RAEPI: Predicting Enhancer-Promoter Interactions Based on Restricted Attention Mechanism. Interdiscip Sci 2025; 17:153-165. [PMID: 39546160 DOI: 10.1007/s12539-024-00669-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 10/02/2024] [Accepted: 10/09/2024] [Indexed: 11/17/2024]
Abstract
Enhancer-promoter interactions (EPIs) are crucial in gene transcription regulation and cell differentiation. Traditional biological experiments are costly and time-consuming, motivating the development of computational prediction methods. However, existing EPI prediction methods inadequately capture the intricate direct interactions between enhancer and promoter sequences, which limits their prediction performance to some extent. In this work, we propose an innovative attention-based approach RAEPI, which uses convolutional neural networks to extract initial features of enhancers and promoters, combined with a specially designed Restricted Attention mechanism with Query-Key-Value constrained to simulate the interactions between them for further feature extraction. To improve cross-cell line prediction, we employ a transfer learning strategy for pre-training. Furthermore, we extracted sequence motifs to evaluate the RAEPI's effectiveness from a visualization perspective. Experimental results show that RAEPI achieves competitive prediction performance to existing methods on the benchmark dataset.
Collapse
Affiliation(s)
- Wanjing Zhang
- College of Computer Science, Sichuan University, Chengdu, 610065, China
| | - Mingyang Zhang
- College of Computer Science, Sichuan University, Chengdu, 610065, China
| | - Min Zhu
- College of Computer Science, Sichuan University, Chengdu, 610065, China.
| |
Collapse
|
2
|
Wall BPG, Nguyen M, Harrell JC, Dozmorov MG. Machine and Deep Learning Methods for Predicting 3D Genome Organization. Methods Mol Biol 2025; 2856:357-400. [PMID: 39283464 DOI: 10.1007/978-1-0716-4136-1_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
Three-dimensional (3D) chromatin interactions, such as enhancer-promoter interactions (EPIs), loops, topologically associating domains (TADs), and A/B compartments, play critical roles in a wide range of cellular processes by regulating gene expression. Recent development of chromatin conformation capture technologies has enabled genome-wide profiling of various 3D structures, even with single cells. However, current catalogs of 3D structures remain incomplete and unreliable due to differences in technology, tools, and low data resolution. Machine learning methods have emerged as an alternative to obtain missing 3D interactions and/or improve resolution. Such methods frequently use genome annotation data (ChIP-seq, DNAse-seq, etc.), DNA sequencing information (k-mers and transcription factor binding site (TFBS) motifs), and other genomic properties to learn the associations between genomic features and chromatin interactions. In this review, we discuss computational tools for predicting three types of 3D interactions (EPIs, chromatin interactions, and TAD boundaries) and analyze their pros and cons. We also point out obstacles to the computational prediction of 3D interactions and suggest future research directions.
Collapse
Affiliation(s)
- Brydon P G Wall
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA, USA
| | - My Nguyen
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
| | - J Chuck Harrell
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, USA
- Massey Comprehensive Cancer Center, Virginia Commonwealth University, Richmond, VA, USA
- Center for Pharmaceutical Engineering, Virginia Commonwealth University, Richmond, VA, USA
| | - Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA.
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
3
|
Wang Z, Yuan H, Yan J, Liu J. Identification, characterization, and design of plant genome sequences using deep learning. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2025; 121:e17190. [PMID: 39666835 DOI: 10.1111/tpj.17190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Revised: 11/11/2024] [Accepted: 11/23/2024] [Indexed: 12/14/2024]
Abstract
Due to its excellent performance in processing large amounts of data and capturing complex non-linear relationships, deep learning has been widely applied in many fields of plant biology. Here we first review the application of deep learning in analyzing genome sequences to predict gene expression, chromatin interactions, and epigenetic features (open chromatin, transcription factor binding sites, and methylation sites) in plants. Then, current motif mining and functional component design and synthesis based on generative adversarial networks, large models, and attention mechanisms are elaborated in detail. The progress of protein structure and function prediction, genomic prediction, and large model applications based on deep learning is also discussed. Finally, this work provides prospects for the future development of deep learning in plants with regard to multiple omics data, algorithm optimization, large language models, sequence design, and intelligent breeding.
Collapse
Affiliation(s)
- Zhenye Wang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Hao Yuan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Hongshan Laboratory, Wuhan, 430070, China
| | - Jianxiao Liu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Hongshan Laboratory, Wuhan, 430070, China
| |
Collapse
|
4
|
Tahir M, Norouzi M, Khan SS, Davie JR, Yamanaka S, Ashraf A. Artificial intelligence and deep learning algorithms for epigenetic sequence analysis: A review for epigeneticists and AI experts. Comput Biol Med 2024; 183:109302. [PMID: 39500240 DOI: 10.1016/j.compbiomed.2024.109302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 09/22/2024] [Accepted: 10/17/2024] [Indexed: 11/20/2024]
Abstract
Epigenetics encompasses mechanisms that can alter the expression of genes without changing the underlying genetic sequence. The epigenetic regulation of gene expression is initiated and sustained by several mechanisms such as DNA methylation, histone modifications, chromatin conformation, and non-coding RNA. The changes in gene regulation and expression can manifest in the form of various diseases and disorders such as cancer and congenital deformities. Over the last few decades, high-throughput experimental approaches have been used to identify and understand epigenetic changes, but these laboratory experimental approaches and biochemical processes are time-consuming and expensive. To overcome these challenges, machine learning and artificial intelligence (AI) approaches have been extensively used for mapping epigenetic modifications to their phenotypic manifestations. In this paper we provide a narrative review of published research on AI models trained on epigenomic data to address a variety of problems such as prediction of disease markers, gene expression, enhancer-promoter interaction, and chromatin states. The purpose of this review is twofold as it is addressed to both AI experts and epigeneticists. For AI researchers, we provided a taxonomy of epigenetics research problems that can benefit from an AI-based approach. For epigeneticists, given each of the above problems we provide a list of candidate AI solutions in the literature. We have also identified several gaps in the literature, research challenges, and recommendations to address these challenges.
Collapse
Affiliation(s)
- Muhammad Tahir
- Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, R3T 5V6, MB, Canada
| | - Mahboobeh Norouzi
- Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, R3T 5V6, MB, Canada
| | - Shehroz S Khan
- College of Engineering and Technology, American University of the Middle East, Kuwait
| | - James R Davie
- Department of Biochemistry and Medical Genetics, Max Rady College of Medicine, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, MB, Canada
| | - Soichiro Yamanaka
- Graduate School of Science, Department of Biophysics and Biochemistry, University of Tokyo, Japan
| | - Ahmed Ashraf
- Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, R3T 5V6, MB, Canada.
| |
Collapse
|
5
|
Garmany A, Terzic A. Artificial intelligence powers regenerative medicine into predictive realm. Regen Med 2024; 19:611-616. [PMID: 39660914 PMCID: PMC11703382 DOI: 10.1080/17460751.2024.2437281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Accepted: 11/29/2024] [Indexed: 12/12/2024] Open
Abstract
The expanding regenerative medicine toolkit is reaching a record number of lives. There is a pressing need to enhance the precision, efficiency, and effectiveness of regenerative approaches and achieve reliable outcomes. While regenerative medicine has relied on an empiric paradigm, availability of big data along with advances in informatics and artificial intelligence offer the opportunity to inform the next generation of regenerative sciences along the discovery, translation, and application pathway. Artificial intelligence can streamline discovery and development of optimized biotherapeutics by aiding in the interpretation of readouts associated with optimal repair outcomes. In advanced biomanufacturing, artificial intelligence holds potential in ensuring quality control and assuring scalability through automated monitoring of process-critical variables mandatory for product consistency. In practice application, artificial intelligence can guide clinical trial design, patient selection, delivery strategies, and outcome assessment. As artificial intelligence transforms the regenerative horizon, caution is necessary to reduce bias, ensure generalizability, and mitigate ethical concerns with the goal of equitable access for patients and populations.
Collapse
Affiliation(s)
- Armin Garmany
- Marriott Heart Disease Research Program, Department of Cardiovascular Medicine, Department of Molecular Pharmacology & Experimental Therapeutics, Department of Clinical Genomics, Mayo Clinic, Rochester, MN, USA
- Mayo Clinic Alix School of Medicine, Regenerative Sciences Track, Mayo Clinic Graduate School of Biomedical Sciences, Mayo Clinic, Rochester, MN, USA
| | - Andre Terzic
- Marriott Heart Disease Research Program, Department of Cardiovascular Medicine, Department of Molecular Pharmacology & Experimental Therapeutics, Department of Clinical Genomics, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
6
|
Liu B, Zhang W, Zeng X, Loza M, Park SJ, Nakai K. TF-EPI: an interpretable enhancer-promoter interaction detection method based on Transformer. Front Genet 2024; 15:1444459. [PMID: 39184348 PMCID: PMC11341371 DOI: 10.3389/fgene.2024.1444459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Accepted: 07/24/2024] [Indexed: 08/27/2024] Open
Abstract
The detection of enhancer-promoter interactions (EPIs) is crucial for understanding gene expression regulation, disease mechanisms, and more. In this study, we developed TF-EPI, a deep learning model based on Transformer designed to detect these interactions solely from DNA sequences. The performance of TF-EPI surpassed that of other state-of-the-art methods on multiple benchmark datasets. Importantly, by utilizing the attention mechanism of the Transformer, we identified distinct cell type-specific motifs and sequences in enhancers and promoters, which were validated against databases such as JASPAR and UniBind, highlighting the potential of our method in discovering new biological insights. Moreover, our analysis of the transcription factors (TFs) corresponding to these motifs and short sequence pairs revealed the heterogeneity and commonality of gene regulatory mechanisms and demonstrated the ability to identify TFs relevant to the source information of the cell line. Finally, the introduction of transfer learning can mitigate the challenges posed by cell type-specific gene regulation, yielding enhanced accuracy in cross-cell line EPI detection. Overall, our work unveils important sequence information for the investigation of enhancer-promoter pairs based on the attention mechanism of the Transformer, providing an important milestone in the investigation of cis-regulatory grammar.
Collapse
Affiliation(s)
- Bowen Liu
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Weihang Zhang
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Xin Zeng
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Martin Loza
- Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| | - Sung-Joon Park
- Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| | - Kenta Nakai
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan
- Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| |
Collapse
|
7
|
Ahmed FS, Aly S, Liu X. EPI-Trans: an effective transformer-based deep learning model for enhancer promoter interaction prediction. BMC Bioinformatics 2024; 25:216. [PMID: 38890584 PMCID: PMC11184834 DOI: 10.1186/s12859-024-05784-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 04/15/2024] [Indexed: 06/20/2024] Open
Abstract
BACKGROUND Recognition of enhancer-promoter Interactions (EPIs) is crucial for human development. EPIs in the genome play a key role in regulating transcription. However, experimental approaches for classifying EPIs are too expensive in terms of effort, time, and resources. Therefore, more and more studies are being done on developing computational techniques, particularly using deep learning and other machine learning techniques, to address such problems. Unfortunately, the majority of current computational methods are based on convolutional neural networks, recurrent neural networks, or a combination of them, which don't take into consideration contextual details and the long-range interactions between the enhancer and promoter sequences. A new transformer-based model called EPI-Trans is presented in this study to overcome the aforementioned limitations. The multi-head attention mechanism in the transformer model automatically learns features that represent the long interrelationships between enhancer and promoter sequences. Furthermore, a generic model is created with transferability that can be utilized as a pre-trained model for various cell lines. Moreover, the parameters of the generic model are fine-tuned using a particular cell line dataset to improve performance. RESULTS Based on the results obtained from six benchmark cell lines, the average AUROC for the specific, generic, and best models is 94.2%, 95%, and 95.7%, while the average AUPR is 80.5%, 66.1%, and 79.6% respectively. CONCLUSIONS This study proposed a transformer-based deep learning model for EPI prediction. The comparative results on certain cell lines show that EPI-Trans outperforms other cutting-edge techniques and can provide superior performance on the challenge of recognizing EPI.
Collapse
Affiliation(s)
- Fatma S Ahmed
- Department of Computer Science and Technology, Xiamen University, Xiamen, 361005, China.
- Department of Electrical Engineering, Aswan University, Aswan, 81542, Egypt.
| | - Saleh Aly
- Department of Electrical Engineering, Aswan University, Aswan, 81542, Egypt.
- Department of Information Technology, Majmaah University, 11952, Majmaah, Saudi Arabia.
| | - Xiangrong Liu
- Department of Computer Science and Technology, Xiamen University, Xiamen, 361005, China
| |
Collapse
|
8
|
Tenekeci S, Tekir S. Identifying promoter and enhancer sequences by graph convolutional networks. Comput Biol Chem 2024; 110:108040. [PMID: 38430611 DOI: 10.1016/j.compbiolchem.2024.108040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 01/09/2024] [Accepted: 02/27/2024] [Indexed: 03/05/2024]
Abstract
Identification of promoters, enhancers, and their interactions helps understand genetic regulation. This study proposes a graph-based semi-supervised learning model (GCN4EPI) for the enhancer-promoter classification problem. We adopt a graph convolutional network (GCN) architecture to integrate interaction information with sequence features. Nodes of the constructed graph hold word embeddings of DNA sequences while edges hold the Enhancer-Promoter Interaction (EPI) information. By means of semi-supervised learning, much less data (16%) and time are needed in model training. Comparisons on a benchmark dataset of six human cell lines show that the proposed approach outperforms the state-of-the-art methods by a large margin (10% higher F1 score) and has the fastest training time (up to 3 times). Moreover, GCN4EPI's performance on cross-cell line data is also better than the baselines (3% higher F1 score). Our qualitative analyses with graph explainability models prove that GCN4EPI learns from both text and graph structure. The results suggest that integrating interaction information with sequence features improves predictive performance and compensates for the number of training instances.
Collapse
Affiliation(s)
- Samet Tenekeci
- Department of Computer Engineering, Izmir Institute of Technology, Izmir, 35430, Turkiye
| | - Selma Tekir
- Department of Computer Engineering, Izmir Institute of Technology, Izmir, 35430, Turkiye.
| |
Collapse
|
9
|
Wall BPG, Nguyen M, Harrell JC, Dozmorov MG. Machine and deep learning methods for predicting 3D genome organization. ARXIV 2024:arXiv:2403.03231v1. [PMID: 38495565 PMCID: PMC10942493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Three-Dimensional (3D) chromatin interactions, such as enhancer-promoter interactions (EPIs), loops, Topologically Associating Domains (TADs), and A/B compartments play critical roles in a wide range of cellular processes by regulating gene expression. Recent development of chromatin conformation capture technologies has enabled genome-wide profiling of various 3D structures, even with single cells. However, current catalogs of 3D structures remain incomplete and unreliable due to differences in technology, tools, and low data resolution. Machine learning methods have emerged as an alternative to obtain missing 3D interactions and/or improve resolution. Such methods frequently use genome annotation data (ChIP-seq, DNAse-seq, etc.), DNA sequencing information (k-mers, Transcription Factor Binding Site (TFBS) motifs), and other genomic properties to learn the associations between genomic features and chromatin interactions. In this review, we discuss computational tools for predicting three types of 3D interactions (EPIs, chromatin interactions, TAD boundaries) and analyze their pros and cons. We also point out obstacles of computational prediction of 3D interactions and suggest future research directions.
Collapse
Affiliation(s)
- Brydon P. G. Wall
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - My Nguyen
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
| | - J. Chuck Harrell
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
- Massey Comprehensive Cancer Center, Virginia Commonwealth University, Richmond, VA 23298, USA
- Center for Pharmaceutical Engineering, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Mikhail G. Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
10
|
Wang Q, Zhang J, Liu Z, Duan Y, Li C. Integrative approaches based on genomic techniques in the functional studies on enhancers. Brief Bioinform 2023; 25:bbad442. [PMID: 38048082 PMCID: PMC10694556 DOI: 10.1093/bib/bbad442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 10/22/2023] [Accepted: 11/08/2023] [Indexed: 12/05/2023] Open
Abstract
With the development of sequencing technology and the dramatic drop in sequencing cost, the functions of noncoding genes are being characterized in a wide variety of fields (e.g. biomedicine). Enhancers are noncoding DNA elements with vital transcription regulation functions. Tens of thousands of enhancers have been identified in the human genome; however, the location, function, target genes and regulatory mechanisms of most enhancers have not been elucidated thus far. As high-throughput sequencing techniques have leapt forwards, omics approaches have been extensively employed in enhancer research. Multidimensional genomic data integration enables the full exploration of the data and provides novel perspectives for screening, identification and characterization of the function and regulatory mechanisms of unknown enhancers. However, multidimensional genomic data are still difficult to integrate genome wide due to complex varieties, massive amounts, high rarity, etc. To facilitate the appropriate methods for studying enhancers with high efficacy, we delineate the principles, data processing modes and progress of various omics approaches to study enhancers and summarize the applications of traditional machine learning and deep learning in multi-omics integration in the enhancer field. In addition, the challenges encountered during the integration of multiple omics data are addressed. Overall, this review provides a comprehensive foundation for enhancer analysis.
Collapse
Affiliation(s)
- Qilin Wang
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Junyou Zhang
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Zhaoshuo Liu
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Yingying Duan
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Chunyan Li
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
- Key Laboratory of Big Data-Based Precision Medicine (Ministry of Industry and Information Technology), Beihang University, Beijing 100191, China
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Beihang University, Beijing 100191, China
| |
Collapse
|
11
|
Umarov R, Hon CC. Enhancer target prediction: state-of-the-art approaches and future prospects. Biochem Soc Trans 2023; 51:1975-1988. [PMID: 37830459 DOI: 10.1042/bst20230917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 10/02/2023] [Accepted: 10/02/2023] [Indexed: 10/14/2023]
Abstract
Enhancers are genomic regions that regulate gene transcription and are located far away from the transcription start sites of their target genes. Enhancers are highly enriched in disease-associated variants and thus deciphering the interactions between enhancers and genes is crucial to understanding the molecular basis of genetic predispositions to diseases. Experimental validations of enhancer targets can be laborious. Computational methods have thus emerged as a valuable alternative for studying enhancer-gene interactions. A variety of computational methods have been developed to predict enhancer targets by incorporating genomic features (e.g. conservation, distance, and sequence), epigenomic features (e.g. histone marks and chromatin contacts) and activity measurements (e.g. covariations of enhancer activity and gene expression). With the recent advances in genome perturbation and chromatin conformation capture technologies, data on experimentally validated enhancer targets are becoming available for supervised training of these methods and evaluation of their performance. In this review, we categorize enhancer target prediction methods based on their rationales and approaches. Then we discuss their merits and limitations and highlight the future directions for enhancer targets prediction.
Collapse
Affiliation(s)
- Ramzan Umarov
- RIKEN Centre for Integrative Medical Sciences, Yokohama RIKEN Institute, Yokohama, Japan
| | - Chung-Chau Hon
- RIKEN Centre for Integrative Medical Sciences, Yokohama RIKEN Institute, Yokohama, Japan
| |
Collapse
|
12
|
Zhang P, Wu H. IChrom-Deep: An Attention-Based Deep Learning Model for Identifying Chromatin Interactions. IEEE J Biomed Health Inform 2023; 27:4559-4568. [PMID: 37402191 DOI: 10.1109/jbhi.2023.3292299] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/06/2023]
Abstract
Identification of chromatin interactions is crucial for advancing our knowledge of gene regulation. However, due to the limitations of high-throughput experimental techniques, there is an urgent need to develop computational methods for predicting chromatin interactions. In this study, we propose a novel attention-based deep learning model, termed IChrom-Deep, to identify chromatin interactions using sequence features and genomic features. The experimental results based on the datasets of three cell lines demonstrate that the IChrom-Deep achieves satisfactory performance and is superior to the previous methods. We also investigate the effect of DNA sequence and associated features and genomic features on chromatin interactions, and highlight the applicable scenarios of some features, such as sequence conservation and distance. Moreover, we identify a few genomic features that are extremely important across different cell lines, and IChrom-Deep achieves comparable performance with only these significant genomic features versus using all genomic features. It is believed that IChrom-Deep can serve as a useful tool for future studies that seek to identify chromatin interactions.
Collapse
|
13
|
Zhao X, Song L, Yang A, Zhang Z, Zhang J, Yang YT, Zhao XM. Prioritizing genes associated with brain disorders by leveraging enhancer-promoter interactions in diverse neural cells and tissues. Genome Med 2023; 15:56. [PMID: 37488639 PMCID: PMC10364416 DOI: 10.1186/s13073-023-01210-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 07/10/2023] [Indexed: 07/26/2023] Open
Abstract
BACKGROUND Prioritizing genes that underlie complex brain disorders poses a considerable challenge. Despite previous studies have found that they shared symptoms and heterogeneity, it remained difficult to systematically identify the risk genes associated with them. METHODS By using the CAGE (Cap Analysis of Gene Expression) read alignment files for 439 human cell and tissue types (including primary cells, tissues and cell lines) from FANTOM5 project, we predicted enhancer-promoter interactions (EPIs) of 439 cell and tissue types in human, and examined their reliability. Then we evaluated the genetic heritability of 17 diverse brain disorders and behavioral-cognitive phenotypes in each neural cell type, brain region, and developmental stage. Furthermore, we prioritized genes associated with brain disorders and phenotypes by leveraging the EPIs in each neural cell and tissue type, and analyzed their pleiotropy and functionality for different categories of disorders and phenotypes. Finally, we characterized the spatiotemporal expression dynamics of these associated genes in cells and tissues. RESULTS We found that identified EPIs showed activity specificity and network aggregation in cell and tissue types, and enriched TF binding in neural cells played key roles in synaptic plasticity and nerve cell development, i.e., EGR1 and SOX family. We also discovered that most neurological disorders exhibit heritability enrichment in neural stem cells and astrocytes, while psychiatric disorders and behavioral-cognitive phenotypes exhibit enrichment in neurons. Furthermore, our identified genes recapitulated well-known risk genes, which exhibited widespread pleiotropy between psychiatric disorders and behavioral-cognitive phenotypes (i.e., FOXP2), and indicated expression specificity in neural cell types, brain regions, and developmental stages associated with disorders and phenotypes. Importantly, we showed the potential associations of brain disorders with brain regions and developmental stages that have not been well studied. CONCLUSIONS Overall, our study characterized the gene-enhancer regulatory networks and genetic mechanisms in the human neural cells and tissues, and illustrated the value of reanalysis of publicly available genomic datasets.
Collapse
Affiliation(s)
- Xingzhong Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, and Department of Neurology of Zhongshan Hospital, Fudan University, 220 Handan Road, Shanghai, 200433, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China
| | - Liting Song
- Institute of Science and Technology for Brain-Inspired Intelligence, and Department of Neurology of Zhongshan Hospital, Fudan University, 220 Handan Road, Shanghai, 200433, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China
| | - Anyi Yang
- Institute of Science and Technology for Brain-Inspired Intelligence, and Department of Neurology of Zhongshan Hospital, Fudan University, 220 Handan Road, Shanghai, 200433, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China
| | - Zichao Zhang
- Institute of Science and Technology for Brain-Inspired Intelligence, and Department of Neurology of Zhongshan Hospital, Fudan University, 220 Handan Road, Shanghai, 200433, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China
| | - Jinglong Zhang
- Institute of Science and Technology for Brain-Inspired Intelligence, and Department of Neurology of Zhongshan Hospital, Fudan University, 220 Handan Road, Shanghai, 200433, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China
| | - Yucheng T Yang
- Institute of Science and Technology for Brain-Inspired Intelligence, and Department of Neurology of Zhongshan Hospital, Fudan University, 220 Handan Road, Shanghai, 200433, China.
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China.
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, and Department of Neurology of Zhongshan Hospital, Fudan University, 220 Handan Road, Shanghai, 200433, China.
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China.
- State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University, Shanghai, 200032, China.
- Internatioal Human Phenome Institutes (Shanghai), Shanghai, 200433, China.
| |
Collapse
|
14
|
Baur B, Shin J, Schreiber J, Zhang S, Zhang Y, Manjunath M, Song JS, Stafford Noble W, Roy S. Leveraging epigenomes and three-dimensional genome organization for interpreting regulatory variation. PLoS Comput Biol 2023; 19:e1011286. [PMID: 37428809 PMCID: PMC10358954 DOI: 10.1371/journal.pcbi.1011286] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Accepted: 06/20/2023] [Indexed: 07/12/2023] Open
Abstract
Understanding the impact of regulatory variants on complex phenotypes is a significant challenge because the genes and pathways that are targeted by such variants and the cell type context in which regulatory variants operate are typically unknown. Cell-type-specific long-range regulatory interactions that occur between a distal regulatory sequence and a gene offer a powerful framework for examining the impact of regulatory variants on complex phenotypes. However, high-resolution maps of such long-range interactions are available only for a handful of cell types. Furthermore, identifying specific gene subnetworks or pathways that are targeted by a set of variants is a significant challenge. We have developed L-HiC-Reg, a Random Forests regression method to predict high-resolution contact counts in new cell types, and a network-based framework to identify candidate cell-type-specific gene networks targeted by a set of variants from a genome-wide association study (GWAS). We applied our approach to predict interactions in 55 Roadmap Epigenomics Mapping Consortium cell types, which we used to interpret regulatory single nucleotide polymorphisms (SNPs) in the NHGRI-EBI GWAS catalogue. Using our approach, we performed an in-depth characterization of fifteen different phenotypes including schizophrenia, coronary artery disease (CAD) and Crohn's disease. We found differentially wired subnetworks consisting of known as well as novel gene targets of regulatory SNPs. Taken together, our compendium of interactions and the associated network-based analysis pipeline leverages long-range regulatory interactions to examine the context-specific impact of regulatory variation in complex phenotypes.
Collapse
Affiliation(s)
- Brittany Baur
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Junha Shin
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Jacob Schreiber
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington, United States of America
| | - Shilu Zhang
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Yi Zhang
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Mohith Manjunath
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Jun S Song
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Department of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - William Stafford Noble
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington, United States of America
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| |
Collapse
|
15
|
Zheng L, Liu L, Zhu W, Ding Y, Wu F. Predicting enhancer-promoter interaction based on epigenomic signals. Front Genet 2023; 14:1133775. [PMID: 37144127 PMCID: PMC10151517 DOI: 10.3389/fgene.2023.1133775] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 04/04/2023] [Indexed: 05/06/2023] Open
Abstract
Introduction: The physical interactions between enhancers and promoters are often involved in gene transcriptional regulation. High tissue-specific enhancer-promoter interactions (EPIs) are responsible for the differential expression of genes. Experimental methods are time-consuming and labor-intensive in measuring EPIs. An alternative approach, machine learning, has been widely used to predict EPIs. However, most existing machine learning methods require a large number of functional genomic and epigenomic features as input, which limits the application to different cell lines. Methods: In this paper, we developed a random forest model, HARD (H3K27ac, ATAC-seq, RAD21, and Distance), to predict EPI using only four types of features. Results: Independent tests on a benchmark dataset showed that HARD outperforms other models with the fewest features. Discussion: Our results revealed that chromatin accessibility and the binding of cohesin are important for cell-line-specific EPIs. Furthermore, we trained the HARD model in the GM12878 cell line and performed testing in the HeLa cell line. The cross-cell-lines prediction also performs well, suggesting it has the potential to be applied to other cell lines.
Collapse
Affiliation(s)
- Leqiong Zheng
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
| | - Li Liu
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Wen Zhu
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
| | - Yijie Ding
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
| | - Fangxiang Wu
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| |
Collapse
|
16
|
Zhu X, Huang Q, Luo J, Kong D, Zhang Y. Mini-review: Gene regulatory network benefits from three-dimensional chromatin conformation and structural biology. Comput Struct Biotechnol J 2023; 21:1728-1737. [PMID: 36890880 PMCID: PMC9986247 DOI: 10.1016/j.csbj.2023.02.028] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 02/15/2023] [Accepted: 02/15/2023] [Indexed: 02/18/2023] Open
Abstract
Gene regulatory networks are now at the forefront of precision biology, which can help researchers better understand how genes and regulatory elements interact to control cellular gene expression, offering a more promising molecular mechanism in biological research. Interactions between the genes and regulatory elements involve different promoters, enhancers, transcription factors, silencers, insulators, and long-range regulatory elements, which occur at a ∼10 µm nucleus in a spatiotemporal manner. In this way, three-dimensional chromatin conformation and structural biology are critical for interpreting the biological effects and the gene regulatory networks. In the review, we have briefly summarized the latest processes in three-dimensional chromatin conformation, microscopic imaging, and bioinformatics, and we have presented the outlook and future directions for these three aspects.
Collapse
Affiliation(s)
- Xiusheng Zhu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Qitong Huang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.,Animal Breeding and Genomics, Wageningen University & Research, Wageningen 6708PB, the Netherlands
| | - Jing Luo
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Dashuai Kong
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.,School of Life Sciences, Henan University, Kaifeng 475004, China
| | - Yubo Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.,College of Life Science and Engineering, Foshan University, Foshan, China
| |
Collapse
|
17
|
Liu S, Xu X, Yang Z, Zhao X, Liu S, Zhang W. EPIHC: Improving Enhancer-Promoter Interaction Prediction by Using Hybrid Features and Communicative Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3435-3443. [PMID: 34473626 DOI: 10.1109/tcbb.2021.3109488] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Enhancer-promoter interactions (EPIs) regulate the expression of specific genes in cells, which help facilitate understanding of gene regulation, cell differentiation and disease mechanisms. EPI identification approaches through wet experiments are often costly and time-consuming, leading to the design of high-efficiency computational methods is in demand. In this paper, we propose a deep neural network-based method named EPIHC to predict Enhancer-Promoter Interactions with Hybrid features and Communicative learning. EPIHC extracts enhancer and promoter sequence-derived features using convolutional neural networks (CNN), and then we design a communicative learning module to capture the communicative information between enhancer and promoter sequences. Besides, EPIHC takes the genomic features of enhancers and promoters into account, incorporating with the sequence-derived features to predict EPIs. The computational experiments show that EPIHC outperforms the existing state-of-the-art EPI prediction methods on the benchmark datasets and chromosome-split datasets, and the study reveals that the communicative learning module can bring explicit information about EPIs, which is ignored by CNN, and provide explainability about EPIs to some degree. Moreover, we consider two strategies to improve the performances of EPIHC in the cross-cell line prediction, and experimental results show that EPIHC constructed on some cell lines can exhibit good performances for other cell lines. The codes and data are available at https://github.com/BioMedicalBigDataMiningLab/EPIHC.
Collapse
|
18
|
Zhang P, Wu Y, Zhou H, Zhou B, Zhang H, Wu H. CLNN-loop: a deep learning model to predict CTCF-mediated chromatin loops in the different cell lines and CTCF-binding sites (CBS) pair types. Bioinformatics 2022; 38:4497-4504. [PMID: 35997565 DOI: 10.1093/bioinformatics/btac575] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 06/28/2022] [Accepted: 08/22/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Three-dimensional (3D) genome organization is of vital importance in gene regulation and disease mechanisms. Previous studies have shown that CTCF-mediated chromatin loops are crucial to studying the 3D structure of cells. Although various experimental techniques have been developed to detect chromatin loops, they have been found to be time-consuming and costly. Nowadays, various sequence-based computational methods can capture significant features of 3D genome organization and help predict chromatin loops. However, these methods have low performance and poor generalization ability in predicting chromatin loops. RESULTS Here, we propose a novel deep learning model, called CLNN-loop, to predict chromatin loops in different cell lines and CTCF-binding sites (CBS) pair types by fusing multiple sequence-based features. The analysis of a series of examinations based on the datasets in the previous study shows that CLNN-loop has satisfactory performance and is superior to the existing methods in terms of predicting chromatin loops. In addition, we apply the SHAP framework to interpret the predictions of different models, and find that CTCF motif and sequence conservation are important signs of chromatin loops in different cell lines and CBS pair types. AVAILABILITY AND IMPLEMENTATION The source code of CLNN-loop is freely available at https://github.com/HaoWuLab-Bioinformatics/CLNN-loop and the webserver of CLNN-loop is freely available at http://hwclnn.sdu.edu.cn. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pengyu Zhang
- School of Software, Shandong University, Jinan, Shandong 250101, China.,College of Information Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Yingfu Wu
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Haoru Zhou
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Bing Zhou
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Hongming Zhang
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Hao Wu
- School of Software, Shandong University, Jinan, Shandong 250101, China
| |
Collapse
|
19
|
Tang L, Zhong Z, Lin Y, Yang Y, Wang J, Martin J, Li M. EPIXplorer: A web server for prediction, analysis and visualization of enhancer-promoter interactions. Nucleic Acids Res 2022; 50:W290-W297. [PMID: 35639508 PMCID: PMC9252822 DOI: 10.1093/nar/gkac397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 05/01/2022] [Accepted: 05/05/2022] [Indexed: 11/13/2022] Open
Abstract
Long distance enhancers can physically interact with promoters to regulate gene expression through formation of enhancer-promoter (E-P) interactions. Identification of E-P interactions is also important for profound understanding of normal developmental and disease-associated risk variants. Although the state-of-art predictive computation methods facilitate the identification of E-P interactions to a certain extent, currently there is no efficient method that can meet various requirements of usage. Here we developed EPIXplorer, a user-friendly web server for efficient prediction, analysis and visualization of E-P interactions. EPIXplorer integrates 9 robust predictive algorithms, supports multiple types of 3D contact data and multi-omics data as input. The output from EPIXplorer is scored, fully annotated by regulatory elements and risk single-nucleotide polymorphisms (SNPs). In addition, the Visualization and Downstream module provide further functional analysis, all the output files and high-quality images are available for download. Together, EPIXplorer provides a user-friendly interface to predict the E-P interactions in an acceptable time, as well as understand how the genome-wide association study (GWAS) variants influence disease pathology by altering DNA looping between enhancers and the target gene promoters. EPIXplorer is available at https://www.csuligroup.com/EPIXplorer.
Collapse
Affiliation(s)
- Li Tang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Zhizhou Zhong
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yisheng Lin
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yifei Yang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jun Wang
- Department of Pediatrics, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - James F Martin
- Department of Molecular Physiology and Biophysics, Baylor College of Medicine, Houston, TX 77030, USA
- Cardiovascular Research Institute, Baylor College of Medicine, Houston, TX 77030, USA
- Texas Heart Institute, Houston, TX 77030, USA
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
20
|
Mora A, Huang X, Jauhari S, Jiang Q, Li X. Chromatin Hubs: A biological and computational outlook. Comput Struct Biotechnol J 2022; 20:3796-3813. [PMID: 35891791 PMCID: PMC9304431 DOI: 10.1016/j.csbj.2022.07.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 07/02/2022] [Accepted: 07/02/2022] [Indexed: 11/20/2022] Open
Abstract
This review discusses our current understanding of chromatin biology and bioinformatics under the unifying concept of “chromatin hubs.” The first part reviews the biology of chromatin hubs, including chromatin–chromatin interaction hubs, chromatin hubs at the nuclear periphery, hubs around macromolecules such as RNA polymerase or lncRNAs, and hubs around nuclear bodies such as the nucleolus or nuclear speckles. The second part reviews existing computational methods, including enhancer–promoter interaction prediction, network analysis, chromatin domain callers, transcription factory predictors, and multi-way interaction analysis. We introduce an integrated model that makes sense of the existing evidence. Understanding chromatin hubs may allow us (i) to explain long-unsolved biological questions such as interaction specificity and redundancy of mechanisms, (ii) to develop more realistic kinetic and functional predictions, and (iii) to explain the etiology of genomic disease.
Collapse
Affiliation(s)
- Antonio Mora
- Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health (Chinese Academy of Sciences), Guangzhou 511436, PR China
- Corresponding authors.
| | - Xiaowei Huang
- Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health (Chinese Academy of Sciences), Guangzhou 511436, PR China
| | - Shaurya Jauhari
- Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health (Chinese Academy of Sciences), Guangzhou 511436, PR China
| | - Qin Jiang
- Affiliated Eye Hospital of Nanjing Medical University, Nanjing 210000, PR China
| | - Xuri Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, and Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou 510060, PR China
- Corresponding authors.
| |
Collapse
|
21
|
Alinejad-Rokny H, Ghavami Modegh R, Rabiee HR, Ramezani Sarbandi E, Rezaie N, Tam KT, Forrest ARR. MaxHiC: A robust background correction model to identify biologically relevant chromatin interactions in Hi-C and capture Hi-C experiments. PLoS Comput Biol 2022; 18:e1010241. [PMID: 35749574 PMCID: PMC9262194 DOI: 10.1371/journal.pcbi.1010241] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Revised: 07/07/2022] [Accepted: 05/23/2022] [Indexed: 12/13/2022] Open
Abstract
Hi-C is a genome-wide chromosome conformation capture technology that detects interactions between pairs of genomic regions and exploits higher order chromatin structures. Conceptually Hi-C data counts interaction frequencies between every position in the genome and every other position. Biologically functional interactions are expected to occur more frequently than transient background and artefactual interactions. To identify biologically relevant interactions, several background models that take biases such as distance, GC content and mappability into account have been proposed. Here we introduce MaxHiC, a background correction tool that deals with these complex biases and robustly identifies statistically significant interactions in both Hi-C and capture Hi-C experiments. MaxHiC uses a negative binomial distribution model and a maximum likelihood technique to correct biases in both Hi-C and capture Hi-C libraries. We systematically benchmark MaxHiC against major Hi-C background correction tools including Hi-C significant interaction callers (SIC) and Hi-C loop callers using published Hi-C, capture Hi-C, and Micro-C datasets. Our results demonstrate that 1) Interacting regions identified by MaxHiC have significantly greater levels of overlap with known regulatory features (e.g. active chromatin histone marks, CTCF binding sites, DNase sensitivity) and also disease-associated genome-wide association SNPs than those identified by currently existing models, 2) the pairs of interacting regions are more likely to be linked by eQTL pairs and 3) more likely to link known regulatory features including known functional enhancer-promoter pairs validated by CRISPRi than any of the existing methods. We also demonstrate that interactions between different genomic region types have distinct distance distributions only revealed by MaxHiC. MaxHiC is publicly available as a python package for the analysis of Hi-C, capture Hi-C and Micro-C data.
Collapse
Affiliation(s)
- Hamid Alinejad-Rokny
- Harry Perkins Institute of Medical Research, QEII Medical Centre and Centre for Medical Research, The University of Western Australia, Perth, Australia
- Bio Medical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, Australia
- Health Data Analytics Program, AI-enabled Processes (AIP) Research Centre, Macquarie University, Sydney, Australia
- * E-mail: (HAR); (ARRF)
| | - Rassa Ghavami Modegh
- Bioinformatics and Computational Biology Lab, Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Hamid R. Rabiee
- Bioinformatics and Computational Biology Lab, Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Ehsan Ramezani Sarbandi
- Bioinformatics and Computational Biology Lab, Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Narges Rezaie
- Center for Complex Biological Systems, University of California Irvine, Irvine, California, United States of America
| | - Kin Tung Tam
- Harry Perkins Institute of Medical Research, QEII Medical Centre and Centre for Medical Research, The University of Western Australia, Perth, Australia
| | - Alistair R. R. Forrest
- Harry Perkins Institute of Medical Research, QEII Medical Centre and Centre for Medical Research, The University of Western Australia, Perth, Australia
- * E-mail: (HAR); (ARRF)
| |
Collapse
|
22
|
Avdeyev P, Zhou J. Computational Approaches for Understanding Sequence Variation Effects on the 3D Genome Architecture. Annu Rev Biomed Data Sci 2022; 5:183-204. [PMID: 35537461 DOI: 10.1146/annurev-biodatasci-102521-012018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Decoding how genomic sequence and its variations affect 3D genome architecture is indispensable for understanding the genetic architecture of various traits and diseases. The 3D genome organization can be significantly altered by genome variations and in turn impact the function of the genomic sequence. Techniques for measuring the 3D genome architecture across spatial scales have opened up new possibilities for understanding how the 3D genome depends upon the genomic sequence and how it can be altered by sequence variations. Computational methods have become instrumental in analyzing and modeling the sequence effects on 3D genome architecture, and recent development in deep learning sequence models have opened up new opportunities for studying the interplay between sequence variations and the 3D genome. In this review, we focus on computational approaches for both the detection and modeling of sequence variation effects on the 3D genome, and we discuss the opportunities presented by these approaches. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Pavel Avdeyev
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, USA;
| | - Jian Zhou
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, USA;
| |
Collapse
|
23
|
Qin T, Lee C, Li S, Cavalcante RG, Orchard P, Yao H, Zhang H, Wang S, Patil S, Boyle AP, Sartor MA. Comprehensive enhancer-target gene assignments improve gene set level interpretation of genome-wide regulatory data. Genome Biol 2022; 23:105. [PMID: 35473573 PMCID: PMC9044877 DOI: 10.1186/s13059-022-02668-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 04/06/2022] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Revealing the gene targets of distal regulatory elements is challenging yet critical for interpreting regulome data. Experiment-derived enhancer-gene links are restricted to a small set of enhancers and/or cell types, while the accuracy of genome-wide approaches remains elusive due to the lack of a systematic evaluation. We combined multiple spatial and in silico approaches for defining enhancer locations and linking them to their target genes aggregated across >500 cell types, generating 1860 human genome-wide distal enhancer-to-target gene definitions (EnTDefs). To evaluate performance, we used gene set enrichment (GSE) testing on 87 independent ENCODE ChIP-seq datasets of 34 transcription factors (TFs) and assessed concordance of results with known TF Gene Ontology annotations, and other benchmarks. RESULTS The top ranked 741 (40%) EnTDefs significantly outperform the common, naïve approach of linking distal regions to the nearest genes, and the top 10 EnTDefs perform well when applied to ChIP-seq data of other cell types. The GSE-based ranking of EnTDefs is highly concordant with ranking based on overlap with curated benchmarks of enhancer-gene interactions. Both our top general EnTDef and cell-type-specific EnTDefs significantly outperform seven independent computational and experiment-based enhancer-gene pair datasets. We show that using our top EnTDefs for GSE with either genome-wide DNA methylation or ATAC-seq data is able to better recapitulate the biological processes changed in gene expression data performed in parallel for the same experiment than our lower-ranked EnTDefs. CONCLUSIONS Our findings illustrate the power of our approach to provide genome-wide interpretation regardless of cell type.
Collapse
Affiliation(s)
- Tingting Qin
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA.
| | - Christopher Lee
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
- Department of Biostatistics, School of Public Health, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Shiting Li
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Raymond G Cavalcante
- Biomedical Research Core Facilities, Epigenomics Core, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Peter Orchard
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Heming Yao
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Hanrui Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Shuze Wang
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Snehal Patil
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Alan P Boyle
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Maureen A Sartor
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA.
- Department of Biostatistics, School of Public Health, University of Michigan Medical School, Ann Arbor, MI, USA.
| |
Collapse
|
24
|
Sefer E. ProbC: joint modeling of epigenome and transcriptome effects in 3D genome. BMC Genomics 2022; 23:287. [PMID: 35397520 PMCID: PMC8994916 DOI: 10.1186/s12864-022-08498-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Accepted: 03/23/2022] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Hi-C and its high nucleosome resolution variant Micro-C provide a window into the spatial packing of a genome in 3D within the cell. Even though both techniques do not directly depend on the binding of specific antibodies, previous work has revealed enriched interactions and domain structures around multiple chromatin marks; epigenetic modifications and transcription factor binding sites. However, the joint impact of chromatin marks in Hi-C and Micro-C interactions have not been globally characterized, which limits our understanding of 3D genome characteristics. An emerging question is whether it is possible to deduce 3D genome characteristics and interactions by integrative analysis of multiple chromatin marks and associate interactions to functionality of the interacting loci. RESULT We come up with a probabilistic method PROBC to decompose Hi-C and Micro-C interactions by known chromatin marks. PROBC is based on convex likelihood optimization, which can directly take into account both interaction existence and nonexistence. Through PROBC, we discover histone modifications (H3K27ac, H3K9me3, H3K4me3, H3K4me1) and CTCF as particularly predictive of Hi-C and Micro-C contacts across cell types and species. Moreover, histone modifications are more effective than transcription factor binding sites in explaining the genome's 3D shape through these interactions. PROBC can successfully predict Hi-C and Micro-C interactions in given species, while it is trained on different cell types or species. For instance, it can predict missing nucleosome resolution Micro-C interactions in human ES cells trained on mouse ES cells only from these 5 chromatin marks with above 0.75 AUC. Additionally, PROBC outperforms the existing methods in predicting interactions across almost all chromosomes. CONCLUSION Via our proposed method, we optimally decompose Hi-C interactions in terms of these chromatin marks at genome and chromosome levels. We find a subset of histone modifications and transcription factor binding sites to be predictive of both Hi-C and Micro-C interactions and TADs across human, mouse, and different cell types. Through learned models, we can predict interactions on species just from chromatin marks for which Hi-C data may be limited.
Collapse
Affiliation(s)
- Emre Sefer
- Department of Computer Science, Ozyegin University, Istanbul, Turkey.
| |
Collapse
|
25
|
Wang S, Hu H, Li X. A systematic study of motif pairs that may facilitate enhancer-promoter interactions. J Integr Bioinform 2022; 19:jib-2021-0038. [PMID: 35130376 PMCID: PMC9069648 DOI: 10.1515/jib-2021-0038] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 01/20/2022] [Indexed: 01/06/2023] Open
Abstract
Pairs of interacting transcription factors (TFs) have previously been shown to bind to enhancers and promoters and contribute to their physical interactions. However, to date, we have limited knowledge about such TF pairs. To fill this void, we systematically studied the co-occurrence of TF-binding motifs in interacting enhancer-promoter (EP) pairs in seven human cell lines. We discovered 423 motif pairs that significantly co-occur in enhancers and promoters of interacting EP pairs. We demonstrated that these motif pairs are biologically meaningful and significantly enriched with motif pairs of known interacting TF pairs. We also showed that the identified motif pairs facilitated the discovery of the interacting EP pairs. The developed pipeline, EPmotifPair, together with the predicted motifs and motif pairs, is available at https://doi.org/10.6084/m9.figshare.14192000. Our study provides a comprehensive list of motif pairs that may contribute to EP physical interactions, which facilitate generating meaningful hypotheses for experimental validation.
Collapse
Affiliation(s)
- Saidi Wang
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA
| | - Haiyan Hu
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA
| | - Xiaoman Li
- Burnett school of Biomedical Science, College of Medicine, University of Central Florida, Orlando, FL, 32816, USA
| |
Collapse
|
26
|
Hait TA, Elkon R, Shamir R. CT-FOCS: a novel method for inferring cell type-specific enhancer–promoter maps. Nucleic Acids Res 2022; 50:e55. [PMID: 35100425 PMCID: PMC9178001 DOI: 10.1093/nar/gkac048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 01/09/2022] [Accepted: 01/15/2022] [Indexed: 11/13/2022] Open
Abstract
Spatiotemporal gene expression patterns are governed to a large extent by the activity of enhancer elements, which engage in physical contacts with their target genes. Identification of enhancer–promoter (EP) links that are functional only in a specific subset of cell types is a key challenge in understanding gene regulation. We introduce CT-FOCS (cell type FOCS), a statistical inference method that uses linear mixed effect models to infer EP links that show marked activity only in a single or a small subset of cell types out of a large panel of probed cell types. Analyzing 808 samples from FANTOM5, covering 472 cell lines, primary cells and tissues, CT-FOCS inferred such EP links more accurately than recent state-of-the-art methods. Furthermore, we show that strictly cell type-specific EP links are very uncommon in the human genome.
Collapse
Affiliation(s)
- Tom Aharon Hait
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Ran Elkon
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv 69978, Israel
| | - Ron Shamir
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
27
|
Shen Y, Zhong Q, Liu T, Wen Z, Shen W, Li L. CharID: a two-step model for universal prediction of interactions between chromatin accessible regions. Brief Bioinform 2022; 23:6514800. [PMID: 35077535 DOI: 10.1093/bib/bbab602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 12/23/2021] [Accepted: 12/24/2021] [Indexed: 11/14/2022] Open
Abstract
Open chromatin regions (OCRs) allow direct interaction between cis-regulatory elements and trans-acting factors. Therefore, predicting all potential OCR-mediated loops is essential for deciphering the regulation mechanism of gene expression. However, existing loop prediction tools are restricted to specific anchor types. Here, we present CharID (Chromatin Accessible Region Interaction Detector), a two-step model that combines neural network and ensemble learning to predict OCR-mediated loops. In the first step, CharID-Anchor, an attention-based hybrid CNN-BiGRU network is constructed to discriminate between the anchor and nonanchor OCRs. In the second step, CharID-Loop uses gradient boosting decision tree with chromosome-split strategy to predict the interactions between anchor OCRs. The performance was assessed in three human cell lines, and CharID showed superior prediction performance compared with other algorithms. In contrast to the methods designed to predict a particular type of loops, CharID can detect varieties of chromatin loops not limited to enhancer-promoter loops or architectural protein-mediated loops. We constructed the OCR-mediated interaction network using the predicted loops and identified hub anchors, which are highlighted by their proximity to housekeeping genes. By analyzing loops containing SNPs associated with cardiovascular disease, we identified an SNP-gene loop indicating the regulation mechanism of the GFOD1. Taken together, CharID universally predicts diverse chromatin loops beyond other state-of-the-art methods, which are limited by anchor types, and experimental techniques, which are limited by sensitivities drastically decaying with the genomic distance of anchors. Finally, we hosted Peaksniffer, a user-friendly web server that provides online prediction, query and visualization of OCRs and associated loops.
Collapse
Affiliation(s)
- Yin Shen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P. R. China
- 3D Genomics Research Center, Huazhong Agricultural University, Wuhan, 430070, P. R. China
| | - Quan Zhong
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P. R. China
- 3D Genomics Research Center, Huazhong Agricultural University, Wuhan, 430070, P. R. China
| | - Tian Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P. R. China
| | - Zi Wen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P. R. China
- 3D Genomics Research Center, Huazhong Agricultural University, Wuhan, 430070, P. R. China
| | - Wei Shen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P. R. China
- 3D Genomics Research Center, Huazhong Agricultural University, Wuhan, 430070, P. R. China
| | - Li Li
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P. R. China
- 3D Genomics Research Center, Huazhong Agricultural University, Wuhan, 430070, P. R. China
| |
Collapse
|
28
|
Chen K, Zhao H, Yang Y. Capturing large genomic contexts for accurately predicting enhancer-promoter interactions. Brief Bioinform 2022; 23:6513727. [DOI: 10.1093/bib/bbab577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 12/13/2021] [Accepted: 12/15/2021] [Indexed: 11/14/2022] Open
Abstract
Abstract
Enhancer-promoter interaction (EPI) is a key mechanism underlying gene regulation. EPI prediction has always been a challenging task because enhancers could regulate promoters of distant target genes. Although many machine learning models have been developed, they leverage only the features in enhancers and promoters, or simply add the average genomic signals in the regions between enhancers and promoters, without utilizing detailed features between or outside enhancers and promoters. Due to a lack of large-scale features, existing methods could achieve only moderate performance, especially for predicting EPIs in different cell types. Here, we present a Transformer-based model, TransEPI, for EPI prediction by capturing large genomic contexts. TransEPI was developed based on EPI datasets derived from Hi-C or ChIA-PET data in six cell lines. To avoid over-fitting, we evaluated the TransEPI model by testing it on independent test datasets where the cell line and chromosome are different from the training data. TransEPI not only achieved consistent performance across the cross-validation and test datasets from different cell types but also outperformed the state-of-the-art machine learning and deep learning models. In addition, we found that the improved performance of TransEPI was attributed to the integration of large genomic contexts. Lastly, TransEPI was extended to study the non-coding mutations associated with brain disorders or neural diseases, and we found that TransEPI was also useful for predicting the target genes of non-coding mutations.
Collapse
|
29
|
Li R, Li L, Xu Y, Yang J. Machine learning meets omics: applications and perspectives. Brief Bioinform 2021; 23:6425809. [PMID: 34791021 DOI: 10.1093/bib/bbab460] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 09/29/2021] [Accepted: 10/07/2021] [Indexed: 02/07/2023] Open
Abstract
The innovation of biotechnologies has allowed the accumulation of omics data at an alarming rate, thus introducing the era of 'big data'. Extracting inherent valuable knowledge from various omics data remains a daunting problem in bioinformatics. Better solutions often need some kind of more innovative methods for efficient handlings and effective results. Recent advancements in integrated analysis and computational modeling of multi-omics data helped address such needs in an increasingly harmonious manner. The development and application of machine learning have largely advanced our insights into biology and biomedicine and greatly promoted the development of therapeutic strategies, especially for precision medicine. Here, we propose a comprehensive survey and discussion on what happened, is happening and will happen when machine learning meets omics. Specifically, we describe how artificial intelligence can be applied to omics studies and review recent advancements at the interface between machine learning and the ever-widest range of omics including genomics, transcriptomics, proteomics, metabolomics, radiomics, as well as those at the single-cell resolution. We also discuss and provide a synthesis of ideas, new insights, current challenges and perspectives of machine learning in omics.
Collapse
Affiliation(s)
- Rufeng Li
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Xi'an Jiaotong University Health Science Center, Xi'an 710061, P. R. China
| | - Lixin Li
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Xi'an Jiaotong University Health Science Center, Xi'an 710061, P. R. China
| | - Yungang Xu
- School of Electronics and Information, Northwestern Polytechnical University, Xi'an, 710129, China
| | - Juan Yang
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Xi'an Jiaotong University Health Science Center, Xi'an 710061, P. R. China.,Key Laboratory of Environment and Genes Related to Diseases (Xi'an Jiaotong University), Ministry of Education of China, Xi'an 710061, P. R. China
| |
Collapse
|
30
|
Wang H, Huang B, Wang J. Predict long-range enhancer regulation based on protein-protein interactions between transcription factors. Nucleic Acids Res 2021; 49:10347-10368. [PMID: 34570239 PMCID: PMC8501976 DOI: 10.1093/nar/gkab841] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 08/10/2021] [Accepted: 09/10/2021] [Indexed: 12/18/2022] Open
Abstract
Long-range regulation by distal enhancers plays critical roles in cell-type specific transcriptional programs. Computational predictions of genome-wide enhancer-promoter interactions are still challenging due to limited accuracy and the lack of knowledge on the molecular mechanisms. Based on recent biological investigations, the protein-protein interactions (PPIs) between transcription factors (TFs) have been found to participate in the regulation of chromatin loops. Therefore, we developed a novel predictive model for cell-type specific enhancer-promoter interactions by leveraging the information of TF PPI signatures. Evaluated by a series of rigorous performance comparisons, the new model achieves superior performance over other methods. The model also identifies specific TF PPIs that may mediate long-range regulatory interactions, revealing new mechanistic understandings of enhancer regulation. The prioritized TF PPIs are associated with genes in distinct biological pathways, and the predicted enhancer-promoter interactions are strongly enriched with cis-eQTLs. Most interestingly, the model discovers enhancer-mediated trans-regulatory links between TFs and genes, which are significantly enriched with trans-eQTLs. The new predictive model, along with the genome-wide analyses, provides a platform to systematically delineate the complex interplay among TFs, enhancers and genes in long-range regulation. The novel predictions also lead to mechanistic interpretations of eQTLs to decode the genetic associations with gene expression.
Collapse
Affiliation(s)
- Hao Wang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, 428 S. Shaw Ln., East Lansing, MI 48824, USA
| | - Binbin Huang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, 428 S. Shaw Ln., East Lansing, MI 48824, USA
| | - Jianrong Wang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, 428 S. Shaw Ln., East Lansing, MI 48824, USA
| |
Collapse
|
31
|
Salviato E, Djordjilović V, Hariprakash JM, Tagliaferri I, Pal K, Ferrari F. Leveraging three-dimensional chromatin architecture for effective reconstruction of enhancer-target gene regulatory interactions. Nucleic Acids Res 2021; 49:e97. [PMID: 34197622 PMCID: PMC8464068 DOI: 10.1093/nar/gkab547] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 06/07/2021] [Accepted: 06/17/2021] [Indexed: 12/23/2022] Open
Abstract
A growing amount of evidence in literature suggests that germline sequence variants and somatic mutations in non-coding distal regulatory elements may be crucial for defining disease risk and prognostic stratification of patients, in genetic disorders as well as in cancer. Their functional interpretation is challenging because genome-wide enhancer-target gene (ETG) pairing is an open problem in genomics. The solutions proposed so far do not account for the hierarchy of structural domains which define chromatin three-dimensional (3D) architecture. Here we introduce a change of perspective based on the definition of multi-scale structural chromatin domains, integrated in a statistical framework to define ETG pairs. In this work (i) we develop a computational and statistical framework to reconstruct a comprehensive map of ETG pairs leveraging functional genomics data; (ii) we demonstrate that the incorporation of chromatin 3D architecture information improves ETG pairing accuracy and (iii) we use multiple experimental datasets to extensively benchmark our method against previous solutions for the genome-wide reconstruction of ETG pairs. This solution will facilitate the annotation and interpretation of sequence variants in distal non-coding regulatory elements. We expect this to be especially helpful in clinically oriented applications of whole genome sequencing in cancer and undiagnosed genetic diseases research.
Collapse
Affiliation(s)
- Elisa Salviato
- IFOM, the FIRC Institute of Molecular Oncology, Milan 20139, Italy
| | - Vera Djordjilović
- Department of Economics, Ca’ Foscari University of Venice, Venice 30100, Italy
| | | | | | - Koustav Pal
- IFOM, the FIRC Institute of Molecular Oncology, Milan 20139, Italy
| | - Francesco Ferrari
- IFOM, the FIRC Institute of Molecular Oncology, Milan 20139, Italy
- Institute of Molecular Genetics “Luigi Luca Cavalli-Sforza”, National Research Council, Pavia 27100, Italy
| |
Collapse
|
32
|
Cao F, Zhang Y, Cai Y, Animesh S, Zhang Y, Akincilar SC, Loh YP, Li X, Chng WJ, Tergaonkar V, Kwoh CK, Fullwood MJ. Chromatin interaction neural network (ChINN): a machine learning-based method for predicting chromatin interactions from DNA sequences. Genome Biol 2021; 22:226. [PMID: 34399797 PMCID: PMC8365954 DOI: 10.1186/s13059-021-02453-5] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 08/04/2021] [Indexed: 11/10/2022] Open
Abstract
Chromatin interactions play important roles in regulating gene expression. However, the availability of genome-wide chromatin interaction data is limited. We develop a computational method, chromatin interaction neural network (ChINN), to predict chromatin interactions between open chromatin regions using only DNA sequences. ChINN predicts CTCF- and RNA polymerase II-associated and Hi-C chromatin interactions. ChINN shows good across-sample performances and captures various sequence features for chromatin interaction prediction. We apply ChINN to 6 chronic lymphocytic leukemia (CLL) patient samples and a published cohort of 84 CLL open chromatin samples. Our results demonstrate extensive heterogeneity in chromatin interactions among CLL patient samples.
Collapse
Affiliation(s)
- Fan Cao
- Cancer Science Institute of Singapore, National University of Singapore, 14 Medical Dr, Singapore, 117599 Singapore
| | - Yu Zhang
- School of Computer Science and Engineering, Nanyang Technological University, Block N4, 50 Nanyang Avenue, Singapore, 639798 Singapore
| | - Yichao Cai
- Cancer Science Institute of Singapore, National University of Singapore, 14 Medical Dr, Singapore, 117599 Singapore
| | - Sambhavi Animesh
- Cancer Science Institute of Singapore, National University of Singapore, 14 Medical Dr, Singapore, 117599 Singapore
| | - Ying Zhang
- Cancer Science Institute of Singapore, National University of Singapore, 14 Medical Dr, Singapore, 117599 Singapore
| | - Semih Can Akincilar
- Institute of Molecular and Cell Biology, Agency for Science (IMCB), A*STAR (Agency for Science, Technology and Research,, Singapore, 138673 Singapore
| | - Yan Ping Loh
- Cancer Science Institute of Singapore, National University of Singapore, 14 Medical Dr, Singapore, 117599 Singapore
| | - Xinya Li
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551 Singapore
| | - Wee Joo Chng
- Cancer Science Institute of Singapore, National University of Singapore, 14 Medical Dr, Singapore, 117599 Singapore
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, 1E Kent Ridge Road, Singapore, 119228 Singapore
- Department of Haematology-Oncology, National University Cancer Institute, National University Health System, NUH Zone B, Medical Centre, Singapore, 119074 Singapore
| | - Vinay Tergaonkar
- Institute of Molecular and Cell Biology, Agency for Science (IMCB), A*STAR (Agency for Science, Technology and Research,, Singapore, 138673 Singapore
- Department of Pathology, Yong Loo Lin School of Medicine, National University of Singapore (NUS), Singapore, 117597 Singapore
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Block N4, 50 Nanyang Avenue, Singapore, 639798 Singapore
| | - Melissa J. Fullwood
- Cancer Science Institute of Singapore, National University of Singapore, 14 Medical Dr, Singapore, 117599 Singapore
- Institute of Molecular and Cell Biology, Agency for Science (IMCB), A*STAR (Agency for Science, Technology and Research,, Singapore, 138673 Singapore
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551 Singapore
| |
Collapse
|
33
|
Min X, Lu F, Li C. Sequence-Based Deep Learning Frameworks on Enhancer-Promoter Interactions Prediction. Curr Pharm Des 2021; 27:1847-1855. [PMID: 33234095 DOI: 10.2174/1381612826666201124112710] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 07/29/2020] [Accepted: 08/06/2020] [Indexed: 11/22/2022]
Abstract
Enhancer-promoter interactions (EPIs) in the human genome are of great significance to transcriptional regulation, which tightly controls gene expression. Identification of EPIs can help us better decipher gene regulation and understand disease mechanisms. However, experimental methods to identify EPIs are constrained by funds, time, and manpower, while computational methods using DNA sequences and genomic features are viable alternatives. Deep learning methods have shown promising prospects in classification and efforts that have been utilized to identify EPIs. In this survey, we specifically focus on sequence-based deep learning methods and conduct a comprehensive review of the literature. First, we briefly introduce existing sequence- based frameworks on EPIs prediction and their technique details. After that, we elaborate on the dataset, pre-processing means, and evaluation strategies. Finally, we concluded with the challenges these methods are confronted with and suggest several future opportunities. We hope this review will provide a useful reference for further studies on enhancer-promoter interactions.
Collapse
Affiliation(s)
- Xiaoping Min
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Fengqing Lu
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Chunyan Li
- Graduate School, Yunnan Minzu University, Kunming 650504, China
| |
Collapse
|
34
|
Schreiber J, Singh R. Machine learning for profile prediction in genomics. Curr Opin Chem Biol 2021; 65:35-41. [PMID: 34107341 DOI: 10.1016/j.cbpa.2021.04.008] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 04/21/2021] [Accepted: 04/24/2021] [Indexed: 02/08/2023]
Abstract
A recent deluge of publicly available multi-omics data has fueled the development of machine learning methods aimed at investigating important questions in genomics. Although the motivations for these methods vary, a task that is commonly adopted is that of profile prediction, where predictions are made for one or more forms of biochemical activity along the genome, for example, histone modification, chromatin accessibility, or protein binding. In this review, we give an overview of the research works performing profile prediction, define two broad categories of profile prediction tasks, and discuss the types of scientific questions that can be answered in each.
Collapse
Affiliation(s)
| | - Ritambhara Singh
- Department of Computer Science, Center for Computational Molecular Biology, Brown University, United States.
| |
Collapse
|
35
|
Talukder A, Hu H, Li X. An intriguing characteristic of enhancer-promoter interactions. BMC Genomics 2021; 22:163. [PMID: 33685407 PMCID: PMC7938488 DOI: 10.1186/s12864-021-07440-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Accepted: 02/12/2021] [Indexed: 01/22/2023] Open
Abstract
Background It is still challenging to predict interacting enhancer-promoter pairs (IEPs), partially because of our limited understanding of their characteristics. To understand IEPs better, here we studied the IEPs in nine cell lines and nine primary cell types. Results By measuring the bipartite clustering coefficient of the graphs constructed from these experimentally supported IEPs, we observed that one enhancer is likely to interact with either none or all of the target genes of another enhancer. This observation implies that enhancers form clusters, and every enhancer in the same cluster synchronously interact with almost every member of a set of genes and only this set of genes. We perceived that an enhancer can be up to two megabase pairs away from other enhancers in the same cluster. We also noticed that although a fraction of these clusters of enhancers do overlap with super-enhancers, the majority of the enhancer clusters are different from the known super-enhancers. Conclusions Our study showed a new characteristic of IEPs, which may shed new light on distal gene regulation and the identification of IEPs. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-021-07440-5).
Collapse
Affiliation(s)
- Amlan Talukder
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Haiyan Hu
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA.
| | - Xiaoman Li
- Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, FL 32816, USA.
| |
Collapse
|
36
|
Lv H, Dao FY, Zulfiqar H, Su W, Ding H, Liu L, Lin H. A sequence-based deep learning approach to predict CTCF-mediated chromatin loop. Brief Bioinform 2021; 22:6149346. [PMID: 33634313 DOI: 10.1093/bib/bbab031] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Revised: 12/01/2020] [Accepted: 01/21/2021] [Indexed: 12/13/2022] Open
Abstract
Three-dimensional (3D) architecture of the chromosomes is of crucial importance for transcription regulation and DNA replication. Various high-throughput chromosome conformation capture-based methods have revealed that CTCF-mediated chromatin loops are a major component of 3D architecture. However, CTCF-mediated chromatin loops are cell type specific, and most chromatin interaction capture techniques are time-consuming and labor-intensive, which restricts their usage on a very large number of cell types. Genomic sequence-based computational models are sophisticated enough to capture important features of chromatin architecture and help to identify chromatin loops. In this work, we develop Deep-loop, a convolutional neural network model, to integrate k-tuple nucleotide frequency component, nucleotide pair spectrum encoding, position conservation, position scoring function and natural vector features for the prediction of chromatin loops. By a series of examination based on cross-validation, Deep-loop shows excellent performance in the identification of the chromatin loops from different cell types. The source code of Deep-loop is freely available at the repository https://github.com/linDing-group/Deep-loop.
Collapse
Affiliation(s)
- Hao Lv
- Informational Biology at University of Electronic Science and Technology of China
| | - Fu-Ying Dao
- Informational Biology at University of Electronic Science and Technology of China
| | - Hasan Zulfiqar
- Informational Biology at University of Electronic Science and Technology of China
| | - Wei Su
- Informational Biology at University of Electronic Science and Technology of China
| | - Hui Ding
- Informational Biology at University of Electronic Science and Technology of China
| | - Li Liu
- Laboratory of Theoretical Biophysics at Inner Mongolia University
| | - Hao Lin
- Informational Biology at University of Electronic Science and Technology of China
| |
Collapse
|
37
|
Yu X, Zhou J, Zhao M, Yi C, Duan Q, Zhou W, Li J. Exploiting XG Boost for Predicting Enhancer-promoter Interactions. Curr Bioinform 2021. [DOI: 10.2174/1574893615666200120103948] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Gene expression and disease control are regulated by the interaction
between distal enhancers and proximal promoters, and the study of enhancer promoter interactions
(EPIs) provides insight into the genetic basis of diseases.
Objective:
Although the recent emergence of high-throughput sequencing methods have a
deepened understanding of EPIs, accurate prediction of EPIs still limitations.
Methods:
We have implemented a XGBoost-based approach and introduced two sets of features
(epigenomic and sequence) to predict the interactions between enhancers and promoters in
different cell lines.
Results:
Extensive experimental results show that XGBoost effectively predicts EPIs across three
cell lines, especially when using epigenomic and sequence features.
Conclusion:
XGBoost outperforms other methods, such as random forest, Adadboost, GBDT, and
TargetFinder.
Collapse
Affiliation(s)
- Xiaojuan Yu
- Software School of Yunnan University, Kunming, China
| | - Jianguo Zhou
- Software School of Yunnan University, Kunming, China
| | - Mingming Zhao
- Software School of Yunnan University, Kunming, China
| | - Chao Yi
- Software School of Yunnan University, Kunming, China
| | - Qing Duan
- Software School of Yunnan University, Kunming, China
| | - Wei Zhou
- Software School of Yunnan University, Kunming, China
| | - Jin Li
- Software School of Yunnan University, Kunming, China
| |
Collapse
|
38
|
Tao H, Li H, Xu K, Hong H, Jiang S, Du G, Wang J, Sun Y, Huang X, Ding Y, Li F, Zheng X, Chen H, Bo X. Computational methods for the prediction of chromatin interaction and organization using sequence and epigenomic profiles. Brief Bioinform 2021; 22:6102668. [PMID: 33454752 PMCID: PMC8424394 DOI: 10.1093/bib/bbaa405] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 11/26/2020] [Accepted: 12/10/2020] [Indexed: 12/14/2022] Open
Abstract
The exploration of three-dimensional chromatin interaction and organization provides insight into mechanisms underlying gene regulation, cell differentiation and disease development. Advances in chromosome conformation capture technologies, such as high-throughput chromosome conformation capture (Hi-C) and chromatin interaction analysis by paired-end tag (ChIA-PET), have enabled the exploration of chromatin interaction and organization. However, high-resolution Hi-C and ChIA-PET data are only available for a limited number of cell lines, and their acquisition is costly, time consuming, laborious and affected by theoretical limitations. Increasing evidence shows that DNA sequence and epigenomic features are informative predictors of regulatory interaction and chromatin architecture. Based on these features, numerous computational methods have been developed for the prediction of chromatin interaction and organization, whereas they are not extensively applied in biomedical study. A systematical study to summarize and evaluate such methods is still needed to facilitate their application. Here, we summarize 48 computational methods for the prediction of chromatin interaction and organization using sequence and epigenomic profiles, categorize them and compare their performance. Besides, we provide a comprehensive guideline for the selection of suitable methods to predict chromatin interaction and organization based on available data and biological question of interest.
Collapse
Affiliation(s)
- Huan Tao
- Beijing Institute of Radiation Medicine
| | - Hao Li
- Beijing Institute of Radiation Medicine
| | - Kang Xu
- Beijing Institute of Radiation Medicine
| | - Hao Hong
- Beijing Institute of Radiation Medicine, Department of Biotechnology
| | - Shuai Jiang
- Beijing Institute of Radiation Medicine, Department of Biotechnology
| | - Guifang Du
- Beijing Institute of Radiation Medicine, Department of Biotechnology
| | | | - Yu Sun
- Beijing Institute of Radiation Medicine, Department of Biotechnology
| | - Xin Huang
- Beijing Institute of Radiation Medicine, Department of Biotechnology
| | - Yang Ding
- Beijing Institute of Radiation Medicine
| | - Fei Li
- Chinese Academy of Sciences, Department of Computer Network Information Center
| | | | | | | |
Collapse
|
39
|
Jing F, Zhang SW, Zhang S. Prediction of enhancer-promoter interactions using the cross-cell type information and domain adversarial neural network. BMC Bioinformatics 2020; 21:507. [PMID: 33160328 PMCID: PMC7648314 DOI: 10.1186/s12859-020-03844-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Accepted: 10/27/2020] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Enhancer-promoter interactions (EPIs) play key roles in transcriptional regulation and disease progression. Although several computational methods have been developed to predict such interactions, their performances are not satisfactory when training and testing data from different cell lines. Currently, it is still unclear what extent a across cell line prediction can be made based on sequence-level information. RESULTS In this work, we present a novel Sequence-based method (called SEPT) to predict the enhancer-promoter interactions in new cell line by using the cross-cell information and Transfer learning. SEPT first learns the features of enhancer and promoter from DNA sequences with convolutional neural network (CNN), then designing the gradient reversal layer of transfer learning to reduce the cell line specific features meanwhile retaining the features associated with EPIs. When the locations of enhancers and promoters are provided in new cell line, SEPT can successfully recognize EPIs in this new cell line based on labeled data of other cell lines. The experiment results show that SEPT can effectively learn the latent import EPIs-related features between cell lines and achieves the best prediction performance in terms of AUC (the area under the receiver operating curves). CONCLUSIONS SEPT is an effective method for predicting the EPIs in new cell line. Domain adversarial architecture of transfer learning used in SEPT can learn the latent EPIs shared features among cell lines from all other existing labeled data. It can be expected that SEPT will be of interest to researchers concerned with biological interaction prediction.
Collapse
Affiliation(s)
- Fang Jing
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, 127 West Youyi Road, Xi’an, 710072 Shaanxi China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, 127 West Youyi Road, Xi’an, 710072 Shaanxi China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, 55 Zhongguancun East Road, Beijing, 10090 China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, 100049 China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223 China
| |
Collapse
|
40
|
Jaroszewicz A, Ernst J. An integrative approach for fine-mapping chromatin interactions. Bioinformatics 2020; 36:1704-1711. [PMID: 31742318 DOI: 10.1093/bioinformatics/btz843] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Revised: 09/30/2019] [Accepted: 11/16/2019] [Indexed: 01/17/2023] Open
Abstract
MOTIVATION Chromatin interactions play an important role in genome architecture and gene regulation. The Hi-C assay generates such interactions maps genome-wide, but at relatively low resolutions (e.g. 5-25 kb), which is substantially coarser than the resolution of transcription factor binding sites or open chromatin sites that are potential sources of such interactions. RESULTS To predict the sources of Hi-C-identified interactions at a high resolution (e.g. 100 bp), we developed a computational method that integrates data from DNase-seq and ChIP-seq of TFs and histone marks. Our method, χ-CNN, uses this data to first train a convolutional neural network (CNN) to discriminate between called Hi-C interactions and non-interactions. χ-CNN then predicts the high-resolution source of each Hi-C interaction using a feature attribution method. We show these predictions recover original Hi-C peaks after extending them to be coarser. We also show χ-CNN predictions enrich for evolutionarily conserved bases, eQTLs and CTCF motifs, supporting their biological significance. χ-CNN provides an approach for analyzing important aspects of genome architecture and gene regulation at a higher resolution than previously possible. AVAILABILITY AND IMPLEMENTATION χ-CNN software is available on GitHub (https://github.com/ernstlab/X-CNN). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Artur Jaroszewicz
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA.,Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Jason Ernst
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA.,Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA.,Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA 90095, USA.,Computer Science Department, University of California, Los Angeles, Los Angeles, CA 90095, USA.,Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA 90095, USA.,Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
41
|
Talukder A, Saadat S, Li X, Hu H. EPIP: a novel approach for condition-specific enhancer-promoter interaction prediction. Bioinformatics 2020; 35:3877-3883. [PMID: 31410461 DOI: 10.1093/bioinformatics/btz641] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Revised: 07/12/2019] [Accepted: 08/11/2019] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The identification of enhancer-promoter interactions (EPIs), especially condition-specific ones, is important for the study of gene transcriptional regulation. Existing experimental approaches for EPI identification are still expensive, and available computational methods either do not consider or have low performance in predicting condition-specific EPIs. RESULTS We developed a novel computational method called EPIP to reliably predict EPIs, especially condition-specific ones. EPIP is capable of predicting interactions in samples with limited data as well as in samples with abundant data. Tested on more than eight cell lines, EPIP reliably identifies EPIs, with an average area under the receiver operating characteristic curve of 0.95 and an average area under the precision-recall curve of 0.73. Tested on condition-specific EPIPs, EPIP correctly identified 99.26% of them. Compared with two recently developed methods, EPIP outperforms them with a better accuracy. AVAILABILITY AND IMPLEMENTATION The EPIP tool is freely available at http://www.cs.ucf.edu/˜xiaoman/EPIP/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Amlan Talukder
- Department of Computer Science, University of Central Florida, Orlando, FL, USA
| | - Samaneh Saadat
- Department of Computer Science, University of Central Florida, Orlando, FL, USA
| | - Xiaoman Li
- Burnett School of Biomedical Science, College of Medicine, University of Central Orlando, Orlando, FL, USA
| | - Haiyan Hu
- Department of Computer Science, University of Central Florida, Orlando, FL, USA
| |
Collapse
|
42
|
Gorkin DU, Barozzi I, Zhao Y, Zhang Y, Huang H, Lee AY, Li B, Chiou J, Wildberg A, Ding B, Zhang B, Wang M, Strattan JS, Davidson JM, Qiu Y, Afzal V, Akiyama JA, Plajzer-Frick I, Novak CS, Kato M, Garvin TH, Pham QT, Harrington AN, Mannion BJ, Lee EA, Fukuda-Yuzawa Y, He Y, Preissl S, Chee S, Han JY, Williams BA, Trout D, Amrhein H, Yang H, Cherry JM, Wang W, Gaulton K, Ecker JR, Shen Y, Dickel DE, Visel A, Pennacchio LA, Ren B. An atlas of dynamic chromatin landscapes in mouse fetal development. Nature 2020; 583:744-751. [PMID: 32728240 PMCID: PMC7398618 DOI: 10.1038/s41586-020-2093-3] [Citation(s) in RCA: 229] [Impact Index Per Article: 45.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Accepted: 06/11/2019] [Indexed: 02/08/2023]
Abstract
The Encyclopedia of DNA Elements (ENCODE) project has established a genomic resource for mammalian development, profiling a diverse panel of mouse tissues at 8 developmental stages from 10.5 days after conception until birth, including transcriptomes, methylomes and chromatin states. Here we systematically examined the state and accessibility of chromatin in the developing mouse fetus. In total we performed 1,128 chromatin immunoprecipitation with sequencing (ChIP-seq) assays for histone modifications and 132 assay for transposase-accessible chromatin using sequencing (ATAC-seq) assays for chromatin accessibility across 72 distinct tissue-stages. We used integrative analysis to develop a unified set of chromatin state annotations, infer the identities of dynamic enhancers and key transcriptional regulators, and characterize the relationship between chromatin state and accessibility during developmental gene regulation. We also leveraged these data to link enhancers to putative target genes and demonstrate tissue-specific enrichments of sequence variants associated with disease in humans. The mouse ENCODE data sets provide a compendium of resources for biomedical researchers and achieve, to our knowledge, the most comprehensive view of chromatin dynamics during mammalian fetal development to date.
Collapse
Affiliation(s)
- David U Gorkin
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
- Center for Epigenomics, University of California, San Diego School of Medicine, La Jolla, CA, USA
| | - Iros Barozzi
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Department of Surgery and Cancer, Imperial College London, London, UK
| | - Yuan Zhao
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, USA
| | - Yanxiao Zhang
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
| | - Hui Huang
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
- Biomedical Sciences Graduate Program, University of California, San Diego School of Medicine, La Jolla, CA, USA
| | - Ah Young Lee
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
| | - Bin Li
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
| | - Joshua Chiou
- Biomedical Sciences Graduate Program, University of California, San Diego School of Medicine, La Jolla, CA, USA
- Department of Pediatrics, University of California, San Diego School of Medicine, La Jolla, CA, USA
| | - Andre Wildberg
- Department of Cellular and Molecular Medicine, University of California, San Diego School of Medicine, La Jolla, CA, USA
| | - Bo Ding
- Department of Cellular and Molecular Medicine, University of California, San Diego School of Medicine, La Jolla, CA, USA
| | - Bo Zhang
- Department of Biochemistry and Molecular Biology, Penn State School of Medicine, Hershey, PA, USA
| | - Mengchi Wang
- Department of Cellular and Molecular Medicine, University of California, San Diego School of Medicine, La Jolla, CA, USA
| | - J Seth Strattan
- Stanford University School of Medicine, Department of Genetics, Stanford, CA, USA
| | - Jean M Davidson
- Stanford University School of Medicine, Department of Genetics, Stanford, CA, USA
| | - Yunjiang Qiu
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, USA
| | - Veena Afzal
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Jennifer A Akiyama
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Ingrid Plajzer-Frick
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Catherine S Novak
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Momoe Kato
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Tyler H Garvin
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Quan T Pham
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Anne N Harrington
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Brandon J Mannion
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Elizabeth A Lee
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Yoko Fukuda-Yuzawa
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Yupeng He
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, USA
- Genomic Analysis Laboratory, Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Sebastian Preissl
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
- Center for Epigenomics, University of California, San Diego School of Medicine, La Jolla, CA, USA
| | - Sora Chee
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
| | - Jee Yun Han
- Center for Epigenomics, University of California, San Diego School of Medicine, La Jolla, CA, USA
| | - Brian A Williams
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Diane Trout
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Henry Amrhein
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Hongbo Yang
- Department of Biochemistry and Molecular Biology, Penn State School of Medicine, Hershey, PA, USA
| | - J Michael Cherry
- Stanford University School of Medicine, Department of Genetics, Stanford, CA, USA
| | - Wei Wang
- Department of Cellular and Molecular Medicine, University of California, San Diego School of Medicine, La Jolla, CA, USA
| | - Kyle Gaulton
- Department of Pediatrics, University of California, San Diego School of Medicine, La Jolla, CA, USA
| | - Joseph R Ecker
- Genomic Analysis Laboratory, Salk Institute for Biological Studies, La Jolla, CA, USA
- Howard Hughes Medical Institute, Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Yin Shen
- Institute for Human Genetics and University of California, San Francisco, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Diane E Dickel
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Axel Visel
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
- US Department of Energy Joint Genome Institute, Berkeley, CA, USA.
- School of Natural Sciences, University of California, Merced, Merced, CA, USA.
| | - Len A Pennacchio
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
- US Department of Energy Joint Genome Institute, Berkeley, CA, USA.
- Comparative Biochemistry Program, University of California, Berkeley, Berkeley, CA, USA.
| | - Bing Ren
- Ludwig Institute for Cancer Research, La Jolla, CA, USA.
- Center for Epigenomics, University of California, San Diego School of Medicine, La Jolla, CA, USA.
- Department of Cellular and Molecular Medicine, University of California, San Diego School of Medicine, La Jolla, CA, USA.
- Institute of Genomic Medicine, University of California, San Diego School of Medicine, La Jolla, CA, USA.
- Moores Cancer Center, University of California, San Diego School of Medicine, La Jolla, CA, USA.
| |
Collapse
|
43
|
Dong K, Zhang S. Joint reconstruction of cis-regulatory interaction networks across multiple tissues using single-cell chromatin accessibility data. Brief Bioinform 2020; 22:5860691. [PMID: 32578841 PMCID: PMC8138825 DOI: 10.1093/bib/bbaa120] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Revised: 05/16/2020] [Accepted: 05/18/2020] [Indexed: 12/11/2022] Open
Abstract
The rapid accumulation of single-cell chromatin accessibility data offers a unique opportunity to investigate common and specific regulatory mechanisms across different cell types. However, existing methods for cis-regulatory network reconstruction using single-cell chromatin accessibility data were only designed for cells belonging to one cell type, and resulting networks may be incomparable directly due to diverse cell numbers of different cell types. Here, we adopt a computational method to jointly reconstruct cis-regulatory interaction maps (JRIM) of multiple cell populations based on patterns of co-accessibility in single-cell data. We applied JRIM to explore common and specific regulatory interactions across multiple tissues from single-cell ATAC-seq dataset containing ~80 000 cells across 13 mouse tissues. Reconstructed common interactions among 13 tissues indeed relate to basic biological functions, and individual cis-regulatory networks show strong tissue specificity and functional relevance. More importantly, tissue-specific regulatory interactions are mediated by coordination of histone modifications and tissue-related TFs, and many of them may reveal novel regulatory mechanisms.
Collapse
Affiliation(s)
- Kangning Dong
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences
| | - Shihua Zhang
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences
| |
Collapse
|
44
|
Chen T, Tyagi S. Integrative computational epigenomics to build data-driven gene regulation hypotheses. Gigascience 2020; 9:giaa064. [PMID: 32543653 PMCID: PMC7297091 DOI: 10.1093/gigascience/giaa064] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 05/25/2020] [Accepted: 05/26/2020] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Diseases are complex phenotypes often arising as an emergent property of a non-linear network of genetic and epigenetic interactions. To translate this resulting state into a causal relationship with a subset of regulatory features, many experiments deploy an array of laboratory assays from multiple modalities. Often, each of these resulting datasets is large, heterogeneous, and noisy. Thus, it is non-trivial to unify these complex datasets into an interpretable phenotype. Although recent methods address this problem with varying degrees of success, they are constrained by their scopes or limitations. Therefore, an important gap in the field is the lack of a universal data harmonizer with the capability to arbitrarily integrate multi-modal datasets. RESULTS In this review, we perform a critical analysis of methods with the explicit aim of harmonizing data, as opposed to case-specific integration. This revealed that matrix factorization, latent variable analysis, and deep learning are potent strategies. Finally, we describe the properties of an ideal universal data harmonization framework. CONCLUSIONS A sufficiently advanced universal harmonizer has major medical implications, such as (i) identifying dysregulated biological pathways responsible for a disease is a powerful diagnostic tool; (2) investigating these pathways further allows the biological community to better understand a disease's mechanisms; and (3) precision medicine also benefits from developments in this area, particularly in the context of the growing field of selective epigenome editing, which can suppress or induce a desired phenotype.
Collapse
Affiliation(s)
- Tyrone Chen
- 25 Rainforest Walk, School of Biological Sciences, Monash University, Clayton, VIC 3800, Australia
| | - Sonika Tyagi
- 25 Rainforest Walk, School of Biological Sciences, Monash University, Clayton, VIC 3800, Australia
| |
Collapse
|
45
|
Machine learning uncovers cell identity regulator by histone code. Nat Commun 2020; 11:2696. [PMID: 32483223 PMCID: PMC7264183 DOI: 10.1038/s41467-020-16539-4] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 05/09/2020] [Indexed: 01/13/2023] Open
Abstract
Conversion between cell types, e.g., by induced expression of master transcription factors, holds great promise for cellular therapy. Our ability to manipulate cell identity is constrained by incomplete information on cell identity genes (CIGs) and their expression regulation. Here, we develop CEFCIG, an artificial intelligent framework to uncover CIGs and further define their master regulators. On the basis of machine learning, CEFCIG reveals unique histone codes for transcriptional regulation of reported CIGs, and utilizes these codes to predict CIGs and their master regulators with high accuracy. Applying CEFCIG to 1,005 epigenetic profiles, our analysis uncovers the landscape of regulation network for identity genes in individual cell or tissue types. Together, this work provides insights into cell identity regulation, and delivers a powerful technique to facilitate regenerative medicine.
Collapse
|
46
|
Hu X, Feng Z, Zhang X, Liu L, Wang S. The Identification of Metal Ion Ligand-Binding Residues by Adding the Reclassified Relative Solvent Accessibility. Front Genet 2020; 11:214. [PMID: 32265982 PMCID: PMC7096583 DOI: 10.3389/fgene.2020.00214] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 02/24/2020] [Indexed: 11/13/2022] Open
Abstract
Many proteins realize their special functions by binding with specific metal ion ligands during a cell's life cycle. The ability to correctly identify metal ion ligand-binding residues is valuable for the human health and the design of molecular drug. Precisely identifying these residues, however, remains challenging work. We have presented an improved computational approach for predicting the binding residues of 10 metal ion ligands (Zn2+, Cu2+, Fe2+, Fe3+, Co2+, Ca2+, Mg2+, Mn2+, Na+, and K+) by adding reclassified relative solvent accessibility (RSA). The best accuracy of fivefold cross-validation was higher than 77.9%, which was about 16% higher than the previous result on the same dataset. It was found that different reclassification of the RSA information can make different contributions to the identification of specific ligand binding residues. Our study has provided an additional understanding of the effect of the RSA on the identification of metal ion ligand binding residues.
Collapse
Affiliation(s)
| | - Zhenxing Feng
- College of Sciences, Inner Mongolla University of Technology, Hohhot, China
| | - Xiaojin Zhang
- College of Sciences, Inner Mongolla University of Technology, Hohhot, China
| | | | | |
Collapse
|
47
|
Kim HJ, Osteil P, Humphrey SJ, Cinghu S, Oldfield AJ, Patrick E, Wilkie EE, Peng G, Suo S, Jothi R, Tam PPL, Yang P. Transcriptional network dynamics during the progression of pluripotency revealed by integrative statistical learning. Nucleic Acids Res 2020; 48:1828-1842. [PMID: 31853542 PMCID: PMC7038952 DOI: 10.1093/nar/gkz1179] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 12/02/2019] [Accepted: 12/09/2019] [Indexed: 12/12/2022] Open
Abstract
The developmental potential of cells, termed pluripotency, is highly dynamic and progresses through a continuum of naive, formative and primed states. Pluripotency progression of mouse embryonic stem cells (ESCs) from naive to formative and primed state is governed by transcription factors (TFs) and their target genes. Genomic techniques have uncovered a multitude of TF binding sites in ESCs, yet a major challenge lies in identifying target genes from functional binding sites and reconstructing dynamic transcriptional networks underlying pluripotency progression. Here, we integrated time-resolved ‘trans-omic’ datasets together with TF binding profiles and chromatin conformation data to identify target genes of a panel of TFs. Our analyses revealed that naive TF target genes are more likely to be TFs themselves than those of formative TFs, suggesting denser hierarchies among naive TFs. We also discovered that formative TF target genes are marked by permissive epigenomic signatures in the naive state, indicating that they are poised for expression prior to the initiation of pluripotency transition to the formative state. Finally, our reconstructed transcriptional networks pinpointed the precise timing from naive to formative pluripotency progression and enabled the spatiotemporal mapping of differentiating ESCs to their in vivo counterparts in developing embryos.
Collapse
Affiliation(s)
- Hani Jieun Kim
- Charles Perkins Centre, School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia.,Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia.,School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, NSW 2006, Australia
| | - Pierre Osteil
- School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, NSW 2006, Australia.,Embryology Unit, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia
| | - Sean J Humphrey
- Charles Perkins Centre, School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia
| | - Senthilkumar Cinghu
- Epigenetics & Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC 27709, USA
| | - Andrew J Oldfield
- Institute of Human Genetics, CNRS, University of Montpellier, Montpellier, France
| | - Ellis Patrick
- Charles Perkins Centre, School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia.,School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, NSW 2006, Australia.,Westmead Institute for Medical Research, University of Sydney, Westmead, NSW 2145, Australia
| | - Emilie E Wilkie
- Embryology Unit, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia
| | - Guangdun Peng
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China, and Guangzhou Regenerative Medicine and Health Guangdong Laboratory (GRMH-GDL), Guangzhou 510005, China
| | - Shengbao Suo
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard T.H. Chan School of Public Health, Boston, MA 02215, USA
| | - Raja Jothi
- Epigenetics & Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC 27709, USA
| | - Patrick P L Tam
- School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, NSW 2006, Australia.,Embryology Unit, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia
| | - Pengyi Yang
- Charles Perkins Centre, School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia.,Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia.,School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, NSW 2006, Australia
| |
Collapse
|
48
|
Xu H, Zhang S, Yi X, Plewczynski D, Li MJ. Exploring 3D chromatin contacts in gene regulation: The evolution of approaches for the identification of functional enhancer-promoter interaction. Comput Struct Biotechnol J 2020; 18:558-570. [PMID: 32226593 PMCID: PMC7090358 DOI: 10.1016/j.csbj.2020.02.013] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2019] [Revised: 02/21/2020] [Accepted: 02/22/2020] [Indexed: 12/12/2022] Open
Abstract
Mechanisms underlying gene regulation are key to understand how multicellular organisms with various cell types develop from the same genetic blueprint. Dynamic interactions between enhancers and genes are revealed to play central roles in controlling gene transcription, but the determinants to link functional enhancer-promoter pairs remain elusive. A major challenge is the lack of reliable approach to detect and verify functional enhancer-promoter interactions (EPIs). In this review, we summarized the current methods for detecting EPIs and described how developing techniques facilitate the identification of EPI through assessing the merits and drawbacks of these methods. We also reviewed recent state-of-art EPI prediction methods in terms of their rationale, data usage and characterization. Furthermore, we briefly discussed the evolved strategies for validating functional EPIs.
Collapse
Affiliation(s)
- Hang Xu
- 2011 Collaborative Innovation Center of Tianjin for Medical Epigenetics, Tianjin Key Laboratory of Medical Epigenetics, Tianjin Medical University, Tianjin, China
| | - Shijie Zhang
- 2011 Collaborative Innovation Center of Tianjin for Medical Epigenetics, Tianjin Key Laboratory of Medical Epigenetics, Tianjin Medical University, Tianjin, China
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Xianfu Yi
- School of Biomedical Engineering, Tianjin Medical University, Tianjin, China
| | - Dariusz Plewczynski
- Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland
| | - Mulin Jun Li
- 2011 Collaborative Innovation Center of Tianjin for Medical Epigenetics, Tianjin Key Laboratory of Medical Epigenetics, Tianjin Medical University, Tianjin, China
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| |
Collapse
|
49
|
Moore JE, Pratt HE, Purcaro MJ, Weng Z. A curated benchmark of enhancer-gene interactions for evaluating enhancer-target gene prediction methods. Genome Biol 2020; 21:17. [PMID: 31969180 PMCID: PMC6977301 DOI: 10.1186/s13059-019-1924-8] [Citation(s) in RCA: 68] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Accepted: 12/23/2019] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Many genome-wide collections of candidate cis-regulatory elements (cCREs) have been defined using genomic and epigenomic data, but it remains a major challenge to connect these elements to their target genes. RESULTS To facilitate the development of computational methods for predicting target genes, we develop a Benchmark of candidate Enhancer-Gene Interactions (BENGI) by integrating the recently developed Registry of cCREs with experimentally derived genomic interactions. We use BENGI to test several published computational methods for linking enhancers with genes, including signal correlation and the TargetFinder and PEP supervised learning methods. We find that while TargetFinder is the best-performing method, it is only modestly better than a baseline distance method for most benchmark datasets when trained and tested with the same cell type and that TargetFinder often does not outperform the distance method when applied across cell types. CONCLUSIONS Our results suggest that current computational methods need to be improved and that BENGI presents a useful framework for method development and testing.
Collapse
Affiliation(s)
- Jill E Moore
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, 01605, USA
| | - Henry E Pratt
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, 01605, USA
| | - Michael J Purcaro
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, 01605, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, 01605, USA.
| |
Collapse
|
50
|
Xiao M, Zhuang Z, Pan W. Local Epigenomic Data are more Informative than Local Genome Sequence Data in Predicting Enhancer-Promoter Interactions Using Neural Networks. Genes (Basel) 2019; 11:E41. [PMID: 31905774 PMCID: PMC7016741 DOI: 10.3390/genes11010041] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Revised: 12/23/2019] [Accepted: 12/26/2019] [Indexed: 12/13/2022] Open
Abstract
Enhancer-promoter interactions (EPIs) are crucial for transcriptional regulation. Mapping such interactions proves useful for understanding disease regulations and discovering risk genes in genome-wide association studies. Some previous studies showed that machine learning methods, as computational alternatives to costly experimental approaches, performed well in predicting EPIs from local sequence and/or local epigenomic data. In particular, deep learning methods were demonstrated to outperform traditional machine learning methods, and using DNA sequence data alone could perform either better than or almost as well as only utilizing epigenomic data. However, most, if not all, of these previous studies were based on randomly splitting enhancer-promoter pairs as training, tuning, and test data, which has recently been pointed out to be problematic; due to multiple and duplicating/overlapping enhancers (and promoters) in enhancer-promoter pairs in EPI data, such random splitting does not lead to independent training, tuning, and test data, thus resulting in model over-fitting and over-estimating predictive performance. Here, after correcting this design issue, we extensively studied the performance of various deep learning models with local sequence and epigenomic data around enhancer-promoter pairs. Our results confirmed much lower performance using either sequence or epigenomic data alone, or both, than reported previously. We also demonstrated that local epigenomic features were more informative than local sequence data. Our results were based on an extensive exploration of many convolutional neural network (CNN) and feed-forward neural network (FNN) structures, and of gradient boosting as a representative of traditional machine learning.
Collapse
Affiliation(s)
- Mengli Xiao
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA;
| | - Zhong Zhuang
- Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455, USA;
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA;
| |
Collapse
|