1
|
Saxena A, Nixon B, Boyd A, Evans J, Faraone SV. A Systematic Review of the Application of Graph Neural Networks to Extract Candidate Genes and Biological Associations. Am J Med Genet B Neuropsychiatr Genet 2025:e33031. [PMID: 40317893 DOI: 10.1002/ajmg.b.33031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Revised: 02/27/2025] [Accepted: 04/15/2025] [Indexed: 05/07/2025]
Abstract
The development of high throughput technologies has resulted in the collection of large quantities of genomic and transcriptomic data. However, identifying disease-associated genes or networks from these data has remained an ongoing challenge. In recent years, graph neural networks (GNNs) have emerged as a promising analytical tool, but it is not well understood which characteristics of these models result in improved performance. We conducted a systematic search and review of publications that used GNNs to identify disease-associated biological interactions. Information was extracted about model characteristics and performance with the goal of examining the relationship between these factors and performance. Data leakage was found in 31% of these models. For node level tasks, univariate positive associations were identified between model accuracy and use of hyper parameter optimization, data leakage via hyperparameter optimization, test set size, and total dataset size. Among graph level tasks, an increase in AUC was identified in association with testing method and a decrease with optimization reporting. Data leakage may pose an issue for GNN-based approaches; the adoption of best practice guidelines and consistent reporting of model design would be beneficial for future studies.
Collapse
Affiliation(s)
- Ankita Saxena
- Department of Neuroscience and Physiology, State University of New York-Norton College of Medicine at Upstate Medical University, New York, USA
- Department of Psychiatry and Behavioral Sciences, State University of new York-Norton College of Medicine at Upstate Medical University, New York, USA
| | - Bridgette Nixon
- College of Medicine, MD Program, Norton College of Medicine at SUNY Upstate Medical University, New York, USA
| | - Amelia Boyd
- College of Medicine, MD Program, Norton College of Medicine at SUNY Upstate Medical University, New York, USA
| | - James Evans
- Health Sciences Library, State University of new York-Upstate Medical University, New York, USA
| | - Stephen V Faraone
- Department of Neuroscience and Physiology, State University of New York-Norton College of Medicine at Upstate Medical University, New York, USA
- Department of Psychiatry and Behavioral Sciences, State University of new York-Norton College of Medicine at Upstate Medical University, New York, USA
| |
Collapse
|
2
|
Wang Z, Yuan H, Yan J, Liu J. Identification, characterization, and design of plant genome sequences using deep learning. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2025; 121:e17190. [PMID: 39666835 DOI: 10.1111/tpj.17190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Revised: 11/11/2024] [Accepted: 11/23/2024] [Indexed: 12/14/2024]
Abstract
Due to its excellent performance in processing large amounts of data and capturing complex non-linear relationships, deep learning has been widely applied in many fields of plant biology. Here we first review the application of deep learning in analyzing genome sequences to predict gene expression, chromatin interactions, and epigenetic features (open chromatin, transcription factor binding sites, and methylation sites) in plants. Then, current motif mining and functional component design and synthesis based on generative adversarial networks, large models, and attention mechanisms are elaborated in detail. The progress of protein structure and function prediction, genomic prediction, and large model applications based on deep learning is also discussed. Finally, this work provides prospects for the future development of deep learning in plants with regard to multiple omics data, algorithm optimization, large language models, sequence design, and intelligent breeding.
Collapse
Affiliation(s)
- Zhenye Wang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Hao Yuan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Hongshan Laboratory, Wuhan, 430070, China
| | - Jianxiao Liu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Hongshan Laboratory, Wuhan, 430070, China
| |
Collapse
|
3
|
Murtaza G, Wagner J, Zook JM, Singh R. GrapHiC: An integrative graph based approach for imputing missing Hi-C reads. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; PP:10.1109/TCBB.2024.3477909. [PMID: 39392732 PMCID: PMC12034241 DOI: 10.1109/tcbb.2024.3477909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/13/2024]
Abstract
Hi-C experiments allow researchers to study and understand the 3D genome organization and its regulatory function. Unfortunately, sequencing costs and technical constraints severely restrict access to high-quality Hi-C data for many cell types. Existing frameworks rely on a sparse Hi-C dataset or cheaper-to-acquire ChIP-seq data to predict Hi-C contact maps with high read coverage. However, these methods fail to generalize to sparse or cross-cell-type inputs because they do not account for the contributions of epigenomic features or the impact of the structural neighborhood in predicting Hi-C reads. We propose GrapHiC, which combines Hi-C and ChIP-seq in a graph representation, allowing more accurate embedding of structural and epigenomic features. Each node represents a binned genomic region, and we assign edge weights using the observed Hi-C reads. Additionally, we embed ChIP-seq and relative positional information as node attributes, allowing our representation to capture structural neighborhoods and the contributions of proteins and their modifications for predicting Hi-C reads. We show that GrapHiC generalizes better than the current state-of-the-art on cross-cell-type settings and sparse Hi-C inputs. Moreover, we can utilize our framework to impute Hi-C reads even when no Hi-C contact map is available, thus making high-quality Hi-C data accessible for many cell types. Availability: https://github.com/rsinghlab/GrapHiC.
Collapse
|
4
|
Sexton C, Victor Paul S, Barth D, Han M. Genome wide clustering on integrated chromatin states and Micro-C contacts reveals chromatin interaction signatures. NAR Genom Bioinform 2024; 6:lqae136. [PMID: 39363891 PMCID: PMC11447530 DOI: 10.1093/nargab/lqae136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 08/21/2024] [Accepted: 09/20/2024] [Indexed: 10/05/2024] Open
Abstract
We can now analyze 3D physical interactions of chromatin regions with chromatin conformation capture technologies, in addition to the 1D chromatin state annotations, but methods to integrate this information are lacking. We propose a method to integrate the chromatin state of interacting regions into a vector representation through the contact-weighted sum of chromatin states. Unsupervised clustering on integrated chromatin states and Micro-C contacts reveals common patterns of chromatin interaction signatures. This provides an integrated view of the complex dynamics of concurrent change occurring in chromatin state and in chromatin interaction, adding another layer of annotation beyond chromatin state or Hi-C contact separately.
Collapse
Affiliation(s)
- Corinne E Sexton
- School of Life Sciences, University of Nevada, Las Vegas, NV 89154, USA
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, NV 89154, USA
| | - Sylvia Victor Paul
- School of Life Sciences, University of Nevada, Las Vegas, NV 89154, USA
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, NV 89154, USA
| | - Dylan Barth
- School of Life Sciences, University of Nevada, Las Vegas, NV 89154, USA
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, NV 89154, USA
| | - Mira V Han
- School of Life Sciences, University of Nevada, Las Vegas, NV 89154, USA
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, NV 89154, USA
| |
Collapse
|
5
|
Lu W, Tang Y, Liu Y, Lin S, Shuai Q, Liang B, Zhang R, Cheng Y, Fang D. CatLearning: highly accurate gene expression prediction from histone mark. Brief Bioinform 2024; 25:bbae373. [PMID: 39073831 DOI: 10.1093/bib/bbae373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 06/14/2024] [Accepted: 07/16/2024] [Indexed: 07/30/2024] Open
Abstract
Histone modifications, known as histone marks, are pivotal in regulating gene expression within cells. The vast array of potential combinations of histone marks presents a considerable challenge in decoding the regulatory mechanisms solely through biological experimental approaches. To overcome this challenge, we have developed a method called CatLearning. It utilizes a modified convolutional neural network architecture with a specialized adaptation Residual Network to quantitatively interpret histone marks and predict gene expression. This architecture integrates long-range histone information up to 500Kb and learns chromatin interaction features without 3D information. By using only one histone mark, CatLearning achieves a high level of accuracy. Furthermore, CatLearning predicts gene expression by simulating changes in histone modifications at enhancers and throughout the genome. These findings help comprehend the architecture of histone marks and develop diagnostic and therapeutic targets for diseases with epigenetic changes.
Collapse
Affiliation(s)
- Weining Lu
- Beijing National Research Center for Information Science and Technology, Tsinghua University, FIT Building, Haidian District, Beijing 100084, China
| | - Yin Tang
- Liangzhu Laboratory, Zhejiang University, 1369 Wenyixi Road, Yuhang District, Hangzhou, Zhejiang, 311121, China
| | - Yu Liu
- Life Sciences Institute, Zhejiang University, 866 Yuhangtang Road, Xihu District, Hangzhou, Zhejiang, 310058, China
| | - Shiyi Lin
- Life Sciences Institute, Zhejiang University, 866 Yuhangtang Road, Xihu District, Hangzhou, Zhejiang, 310058, China
| | - Qifan Shuai
- School of Electron and Computer, Southeast University Chengxian College, 371 Heyan Road, Qixia District, Nanjing, Jiangsu 210088, China
| | - Bin Liang
- Department of Automation, Tsinghua University, 1 Tsinghua Garden, Haidian District, Beijing, 100084, China
| | - Rongqing Zhang
- Zhejiang Provincial Key Laboratory of Applied Enzymology, Yangtze Delta Region Institute of Tsinghua University, 705 Yatai Road, Jiaxing 314006, China
| | - Yu Cheng
- The Chinese University of Hong Kong, Shatin, NT, Hong Kong, 999077, China
| | - Dong Fang
- Life Sciences Institute, Zhejiang University, 866 Yuhangtang Road, Xihu District, Hangzhou, Zhejiang, 310058, China
- Department of Medical Oncology, The Second Affiliated Hospital, Zhejiang University School of Medicine, Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, 88 Jiefang Road, Shangcheng District, Hangzhou, Zhejiang, 310009, China
| |
Collapse
|
6
|
Cui W, Long Q, Xiao M, Wang X, Feng G, Li X, Wang P, Zhou Y. Refining computational inference of gene regulatory networks: integrating knockout data within a multi-task framework. Brief Bioinform 2024; 25:bbae361. [PMID: 39082651 PMCID: PMC11289685 DOI: 10.1093/bib/bbae361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 05/09/2024] [Accepted: 07/16/2024] [Indexed: 08/03/2024] Open
Abstract
Constructing accurate gene regulatory network s (GRNs), which reflect the dynamic governing process between genes, is critical to understanding the diverse cellular process and unveiling the complexities in biological systems. With the development of computer sciences, computational-based approaches have been applied to the GRNs inference task. However, current methodologies face challenges in effectively utilizing existing topological information and prior knowledge of gene regulatory relationships, hindering the comprehensive understanding and accurate reconstruction of GRNs. In response, we propose a novel graph neural network (GNN)-based Multi-Task Learning framework for GRN reconstruction, namely MTLGRN. Specifically, we first encode the gene promoter sequences and the gene biological features and concatenate the corresponding feature representations. Then, we construct a multi-task learning framework including GRN reconstruction, Gene knockout predict, and Gene expression matrix reconstruction. With joint training, MTLGRN can optimize the gene latent representations by integrating gene knockout information, promoter characteristics, and other biological attributes. Extensive experimental results demonstrate superior performance compared with state-of-the-art baselines on the GRN reconstruction task, efficiently leveraging biological knowledge and comprehensively understanding the gene regulatory relationships. MTLGRN also pioneered attempts to simulate gene knockouts on bulk data by incorporating gene knockout information.
Collapse
Affiliation(s)
- Wentao Cui
- Computer Network Information Center, Chinese Academy of Sciences, CAS Informatization Plaza No. 2 Dong Sheng Nan Lu, Haidian District, Beijing, 100083, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
| | - Qingqing Long
- Computer Network Information Center, Chinese Academy of Sciences, CAS Informatization Plaza No. 2 Dong Sheng Nan Lu, Haidian District, Beijing, 100083, China
| | - Meng Xiao
- Computer Network Information Center, Chinese Academy of Sciences, CAS Informatization Plaza No. 2 Dong Sheng Nan Lu, Haidian District, Beijing, 100083, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
| | - Xuezhi Wang
- Computer Network Information Center, Chinese Academy of Sciences, CAS Informatization Plaza No. 2 Dong Sheng Nan Lu, Haidian District, Beijing, 100083, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
| | - Guihai Feng
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing, 100101, China
| | - Xin Li
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing, 100101, China
| | - Pengfei Wang
- Computer Network Information Center, Chinese Academy of Sciences, CAS Informatization Plaza No. 2 Dong Sheng Nan Lu, Haidian District, Beijing, 100083, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
| | - Yuanchun Zhou
- Computer Network Information Center, Chinese Academy of Sciences, CAS Informatization Plaza No. 2 Dong Sheng Nan Lu, Haidian District, Beijing, 100083, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Shijingshan District, Beijing, 100049, China
| |
Collapse
|
7
|
Yan F, Jiang L, Chen D, Ceccarelli M, Guo Y. Reinventing gene expression connectivity through regulatory and spatial structural empowerment via principal node aggregation graph neural network. Nucleic Acids Res 2024; 52:e60. [PMID: 38884259 PMCID: PMC11260459 DOI: 10.1093/nar/gkae514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Accepted: 06/04/2024] [Indexed: 06/18/2024] Open
Abstract
The intricacies of the human genome, manifested as a complex network of genes, transcend conventional representations in text or numerical matrices. The intricate gene-to-gene relationships inherent in this complexity find a more suitable depiction in graph structures. In the pursuit of predicting gene expression, an endeavor shared by predecessors like the L1000 and Enformer methods, we introduce a novel spatial graph-neural network (GNN) approach. This innovative strategy incorporates graph features, encompassing both regulatory and structural elements. The regulatory elements include pair-wise gene correlation, biological pathways, protein-protein interaction networks, and transcription factor regulation. The spatial structural elements include chromosomal distance, histone modification and Hi-C inferred 3D genomic features. Principal Node Aggregation models, validated independently, emerge as frontrunners, demonstrating superior performance compared to traditional regression and other deep learning models. By embracing the spatial GNN paradigm, our method significantly advances the description of the intricate network of gene interactions, surpassing the performance, predictable scope, and initial requirements set by previous methods.
Collapse
Affiliation(s)
- Fengyao Yan
- Department of Public Health and Sciences, University of Miami, Miami, FL 33126, USA
- Department of Computer Science, University of South Carolina, Columbia, SC 29201, USA
| | - Limin Jiang
- Department of Public Health and Sciences, University of Miami, Miami, FL 33126, USA
| | - Danqian Chen
- Department of Public Health and Sciences, University of Miami, Miami, FL 33126, USA
| | - Michele Ceccarelli
- Department of Public Health and Sciences, University of Miami, Miami, FL 33126, USA
| | - Yan Guo
- Department of Public Health and Sciences, University of Miami, Miami, FL 33126, USA
| |
Collapse
|
8
|
Suita Y, Bright H, Pu Y, Toruner MD, Idehen J, Tapinos N, Singh R. Machine learning on multiple epigenetic features reveals H3K27Ac as a driver of gene expression prediction across patients with glioblastoma. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.25.600585. [PMID: 38979226 PMCID: PMC11230286 DOI: 10.1101/2024.06.25.600585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Cancer cells show remarkable plasticity and can switch lineages in response to the tumor microenvironment. Cellular plasticity drives invasiveness and metastasis and helps cancer cells to evade therapy by developing resistance to radiation and cytotoxic chemotherapy. Increased understanding of cell fate determination through epigenetic reprogramming is critical to discover how cancer cells achieve transcriptomic and phenotypic plasticity. Glioblastoma is a perfect example of cancer evolution where cells retain an inherent level of plasticity through activation or maintenance of progenitor developmental programs. However, the principles governing epigenetic drivers of cellular plasticity in glioblastoma remain poorly understood. Here, using machine learning (ML) we employ cross-patient prediction of transcript expression using a combination of epigenetic features (ATAC-seq, CTCF ChIP-seq, RNAPII ChIP-seq, H3K27Ac ChIP-seq, and RNA-seq) of glioblastoma stem cells (GSCs). We investigate different ML and deep learning (DL) models for this task and build our final pipeline using XGBoost. The model trained on one patient generalizes to another one suggesting that the epigenetic signals governing gene transcription are consistent across patients even if GSCs can be very different. We demonstrate that H3K27Ac is the epigenetic feature providing the most significant contribution to cross-patient prediction of gene expression. In addition, using H3K27Ac signals from patients-derived GSCs, we can predict gene expression of human neural crest stem cells suggesting a shared developmental epigenetic trajectory between subpopulations of these malignant and benign stem cells. Our cross-patient ML/DL models determine weighted patterns of influence of epigenetic marks on gene expression across patients with glioblastoma and between GSCs and neural crest stem cells. We propose that broader application of this analysis could reshape our view of glioblastoma tumor evolution and inform the design of new epigenetic targeting therapies.
Collapse
Affiliation(s)
- Yusuke Suita
- Laboratory of Cancer Epigenetics and Plasticity, Department of Neurosurgery, Brown University, Providence, RI 02903, USA
| | - Hardy Bright
- Data Science Institute, Brown University, Providence, RI 02903, USA
| | - Yuan Pu
- Center for Computational Molecular Biology, Brown University, Providence, RI 02903, USA
| | - Merih Deniz Toruner
- Laboratory of Cancer Epigenetics and Plasticity, Department of Neurosurgery, Brown University, Providence, RI 02903, USA
- Center for Computational Molecular Biology, Brown University, Providence, RI 02903, USA
| | - Jordan Idehen
- Department of Computer Science, Brown University, Providence, RI 02903, USA
| | - Nikos Tapinos
- Laboratory of Cancer Epigenetics and Plasticity, Department of Neurosurgery, Brown University, Providence, RI 02903, USA
- Brown RNA Center, Brown University, Providence, RI 02903, USA
| | - Ritambhara Singh
- Department of Computer Science, Brown University, Providence, RI 02903, USA
- Center for Computational Molecular Biology, Brown University, Providence, RI 02903, USA
| |
Collapse
|
9
|
Gonzalez-Avalos E, Onodera A, Samaniego-Castruita D, Rao A, Ay F. Predicting gene expression state and prioritizing putative enhancers using 5hmC signal. Genome Biol 2024; 25:142. [PMID: 38825692 PMCID: PMC11145787 DOI: 10.1186/s13059-024-03273-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 05/11/2024] [Indexed: 06/04/2024] Open
Abstract
BACKGROUND Like its parent base 5-methylcytosine (5mC), 5-hydroxymethylcytosine (5hmC) is a direct epigenetic modification of cytosines in the context of CpG dinucleotides. 5hmC is the most abundant oxidized form of 5mC, generated through the action of TET dioxygenases at gene bodies of actively-transcribed genes and at active or lineage-specific enhancers. Although such enrichments are reported for 5hmC, to date, predictive models of gene expression state or putative regulatory regions for genes using 5hmC have not been developed. RESULTS Here, by using only 5hmC enrichment in genic regions and their vicinity, we develop neural network models that predict gene expression state across 49 cell types. We show that our deep neural network models distinguish high vs low expression state utilizing only 5hmC levels and these predictive models generalize to unseen cell types. Further, in order to leverage 5hmC signal in distal enhancers for expression prediction, we employ an Activity-by-Contact model and also develop a graph convolutional neural network model with both utilizing Hi-C data and 5hmC enrichment to prioritize enhancer-promoter links. These approaches identify known and novel putative enhancers for key genes in multiple immune cell subsets. CONCLUSIONS Our work highlights the importance of 5hmC in gene regulation through proximal and distal mechanisms and provides a framework to link it to genome function. With the recent advances in 6-letter DNA sequencing by short and long-read techniques, profiling of 5mC and 5hmC may be done routinely in the near future, hence, providing a broad range of applications for the methods developed here.
Collapse
Affiliation(s)
- Edahi Gonzalez-Avalos
- La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA, 92037, USA
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, 92093, USA
| | - Atsushi Onodera
- La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA, 92037, USA
- Department of Immunology, Graduate School of Medicine, Chiba University, Chiba, 260-8670, Japan
| | - Daniela Samaniego-Castruita
- La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA, 92037, USA
- Biological Sciences Graduate Program, University of California San Diego, La Jolla, CA, 92093, USA
| | - Anjana Rao
- La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA, 92037, USA.
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, 92093, USA.
- Department of Pharmacology, University of California San Diego, La Jolla, CA, 92093, USA.
- Sanford Consortium for Regenerative Medicine, La Jolla, CA, 92093, USA.
- Moores Cancer Center, University of California San Diego, La Jolla, CA, 92093, USA.
| | - Ferhat Ay
- La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA, 92037, USA.
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, 92093, USA.
- Moores Cancer Center, University of California San Diego, La Jolla, CA, 92093, USA.
- Department of Pediatrics, University of California San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
10
|
Murphy D, Salataj E, Di Giammartino DC, Rodriguez-Hernaez J, Kloetgen A, Garg V, Char E, Uyehara CM, Ee LS, Lee U, Stadtfeld M, Hadjantonakis AK, Tsirigos A, Polyzos A, Apostolou E. 3D Enhancer-promoter networks provide predictive features for gene expression and coregulation in early embryonic lineages. Nat Struct Mol Biol 2024; 31:125-140. [PMID: 38053013 PMCID: PMC10897904 DOI: 10.1038/s41594-023-01130-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 09/18/2023] [Indexed: 12/07/2023]
Abstract
Mammalian embryogenesis commences with two pivotal and binary cell fate decisions that give rise to three essential lineages: the trophectoderm, the epiblast and the primitive endoderm. Although key signaling pathways and transcription factors that control these early embryonic decisions have been identified, the non-coding regulatory elements through which transcriptional regulators enact these fates remain understudied. Here, we characterize, at a genome-wide scale, enhancer activity and 3D connectivity in embryo-derived stem cell lines that represent each of the early developmental fates. We observe extensive enhancer remodeling and fine-scale 3D chromatin rewiring among the three lineages, which strongly associate with transcriptional changes, although distinct groups of genes are irresponsive to topological changes. In each lineage, a high degree of connectivity, or 'hubness', positively correlates with levels of gene expression and enriches for cell-type specific and essential genes. Genes within 3D hubs also show a significantly stronger probability of coregulation across lineages compared to genes in linear proximity or within the same contact domains. By incorporating 3D chromatin features, we build a predictive model for transcriptional regulation (3D-HiChAT) that outperforms models using only 1D promoter or proximal variables to predict levels and cell-type specificity of gene expression. Using 3D-HiChAT, we identify, in silico, candidate functional enhancers and hubs in each cell lineage, and with CRISPRi experiments, we validate several enhancers that control gene expression in their respective lineages. Our study identifies 3D regulatory hubs associated with the earliest mammalian lineages and describes their relationship to gene expression and cell identity, providing a framework to comprehensively understand lineage-specific transcriptional behaviors.
Collapse
Affiliation(s)
- Dylan Murphy
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
- Physiology, Biophysics and Systems Biology Program, Weill Cornell Graduate School of Medical Sciences, Cornell University, New York, NY, USA
| | - Eralda Salataj
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
| | - Dafne Campigli Di Giammartino
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
- 3D Chromatin Conformation and RNA Genomics Laboratory, Center for Human Technologies (CHT), Istituto Italiano di Tecnologia (IIT), Genova, Italy
| | - Javier Rodriguez-Hernaez
- Department of Pathology, New York University Langone Health, New York, NY, USA
- Department of Medicine, New York University Langone Health, New York, NY, USA
- Applied Bioinformatics Laboratory, New York University Langone Health, New York, NY, USA
| | - Andreas Kloetgen
- Department of Pathology, New York University Langone Health, New York, NY, USA
- Department of Medicine, New York University Langone Health, New York, NY, USA
- Applied Bioinformatics Laboratory, New York University Langone Health, New York, NY, USA
| | - Vidur Garg
- Developmental Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Biochemistry Cell and Molecular Biology Program, Weill Cornell Graduate School of Medical Sciences, Cornell University, New York, NY, USA
| | - Erin Char
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Christopher M Uyehara
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
| | - Ly-Sha Ee
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
| | - UkJin Lee
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
- Biochemistry Cell and Molecular Biology Program, Weill Cornell Graduate School of Medical Sciences, Cornell University, New York, NY, USA
| | - Matthias Stadtfeld
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
| | - Anna-Katerina Hadjantonakis
- Developmental Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Aristotelis Tsirigos
- Department of Pathology, New York University Langone Health, New York, NY, USA.
- Department of Medicine, New York University Langone Health, New York, NY, USA.
- Applied Bioinformatics Laboratory, New York University Langone Health, New York, NY, USA.
| | - Alexander Polyzos
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA.
| | - Effie Apostolou
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
11
|
Wang Q, Zhang J, Liu Z, Duan Y, Li C. Integrative approaches based on genomic techniques in the functional studies on enhancers. Brief Bioinform 2023; 25:bbad442. [PMID: 38048082 PMCID: PMC10694556 DOI: 10.1093/bib/bbad442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 10/22/2023] [Accepted: 11/08/2023] [Indexed: 12/05/2023] Open
Abstract
With the development of sequencing technology and the dramatic drop in sequencing cost, the functions of noncoding genes are being characterized in a wide variety of fields (e.g. biomedicine). Enhancers are noncoding DNA elements with vital transcription regulation functions. Tens of thousands of enhancers have been identified in the human genome; however, the location, function, target genes and regulatory mechanisms of most enhancers have not been elucidated thus far. As high-throughput sequencing techniques have leapt forwards, omics approaches have been extensively employed in enhancer research. Multidimensional genomic data integration enables the full exploration of the data and provides novel perspectives for screening, identification and characterization of the function and regulatory mechanisms of unknown enhancers. However, multidimensional genomic data are still difficult to integrate genome wide due to complex varieties, massive amounts, high rarity, etc. To facilitate the appropriate methods for studying enhancers with high efficacy, we delineate the principles, data processing modes and progress of various omics approaches to study enhancers and summarize the applications of traditional machine learning and deep learning in multi-omics integration in the enhancer field. In addition, the challenges encountered during the integration of multiple omics data are addressed. Overall, this review provides a comprehensive foundation for enhancer analysis.
Collapse
Affiliation(s)
- Qilin Wang
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Junyou Zhang
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Zhaoshuo Liu
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Yingying Duan
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Chunyan Li
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
- Key Laboratory of Big Data-Based Precision Medicine (Ministry of Industry and Information Technology), Beihang University, Beijing 100191, China
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Beihang University, Beijing 100191, China
| |
Collapse
|
12
|
Li Y, Lu Y, Kang C, Li P, Chen L. Revealing Tissue Heterogeneity and Spatial Dark Genes from Spatially Resolved Transcriptomics by Multiview Graph Networks. RESEARCH (WASHINGTON, D.C.) 2023; 6:0228. [PMID: 37736108 PMCID: PMC10511271 DOI: 10.34133/research.0228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 08/25/2023] [Indexed: 09/23/2023]
Abstract
Spatially resolved transcriptomics (SRT) is capable of comprehensively characterizing gene expression patterns and providing an unbiased image of spatial composition. To fully understand the organizational complexity and tumor immune escape mechanism, we propose stMGATF, a multiview graph attention fusion model that integrates gene expression, histological images, spatial location, and gene association. To better extract information, stMGATF exploits SimCLRv2 for visual feature exaction and employs edge feature enhanced graph attention networks for the learning potential embedding of each view. A global attention mechanism is used to adaptively integrate 3 views to obtain low-dimensional representation. Applied to diverse SRT datasets, stMGATF is robust and outperforms other methods in detecting spatial domains and denoising data even with different resolutions and platforms. In particular, stMGATF contributes to the elucidation of tissue heterogeneity and extraction of 3-dimensional expression domains. Importantly, considering the associations between genes in tumors, stMGATF can identify the spatial dark genes ignored by traditional methods, which can be used to predict tumor-driving transcription factors and reveal tumor immune escape mechanisms, providing theoretical evidence for the development of new immunotherapeutic strategies.
Collapse
Affiliation(s)
- Ying Li
- School of Mathematics and Statistics,
Henan University of Science and Technology, Luoyang, 471023, China
| | - Yuejing Lu
- School of Mathematics and Statistics,
Henan University of Science and Technology, Luoyang, 471023, China
| | - Chen Kang
- School of Mathematics and Statistics,
Henan University of Science and Technology, Luoyang, 471023, China
| | - Peiluan Li
- School of Mathematics and Statistics,
Henan University of Science and Technology, Luoyang, 471023, China
- Longmen Laboratory, Luoyang, Henan, 471003, China
| | - Luonan Chen
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science,
Chinese Academy of Sciences, Shanghai, 201100, China
- Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou Institute for Advanced Study,
University of Chinese Academy of Sciences, Hangzhou, 310000, China
- School of Life Science and Technology,
ShanghaiTech University, Shanghai, 201100, China
| |
Collapse
|
13
|
Murphy D, Salataj E, Di Giammartino DC, Rodriguez-Hernaez J, Kloetgen A, Garg V, Char E, Uyehara CM, Ee LS, Lee U, Stadtfeld M, Hadjantonakis AK, Tsirigos A, Polyzos A, Apostolou E. Systematic mapping and modeling of 3D enhancer-promoter interactions in early mouse embryonic lineages reveal regulatory principles that determine the levels and cell-type specificity of gene expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.19.549714. [PMID: 37577543 PMCID: PMC10422694 DOI: 10.1101/2023.07.19.549714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Mammalian embryogenesis commences with two pivotal and binary cell fate decisions that give rise to three essential lineages, the trophectoderm (TE), the epiblast (EPI) and the primitive endoderm (PrE). Although key signaling pathways and transcription factors that control these early embryonic decisions have been identified, the non-coding regulatory elements via which transcriptional regulators enact these fates remain understudied. To address this gap, we have characterized, at a genome-wide scale, enhancer activity and 3D connectivity in embryo-derived stem cell lines that represent each of the early developmental fates. We observed extensive enhancer remodeling and fine-scale 3D chromatin rewiring among the three lineages, which strongly associate with transcriptional changes, although there are distinct groups of genes that are irresponsive to topological changes. In each lineage, a high degree of connectivity or "hubness" positively correlates with levels of gene expression and enriches for cell-type specific and essential genes. Genes within 3D hubs also show a significantly stronger probability of coregulation across lineages, compared to genes in linear proximity or within the same contact domains. By incorporating 3D chromatin features, we build a novel predictive model for transcriptional regulation (3D-HiChAT), which outperformed models that use only 1D promoter or proximal variables in predicting levels and cell-type specificity of gene expression. Using 3D-HiChAT, we performed genome-wide in silico perturbations to nominate candidate functional enhancers and hubs in each cell lineage, and with CRISPRi experiments we validated several novel enhancers that control expression of one or more genes in their respective lineages. Our study comprehensively identifies 3D regulatory hubs associated with the earliest mammalian lineages and describes their relationship to gene expression and cell identity, providing a framework to understand lineage-specific transcriptional behaviors.
Collapse
Affiliation(s)
- Dylan Murphy
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
| | - Eralda Salataj
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
| | - Dafne Campigli Di Giammartino
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
- 3D Chromatin Conformation and RNA genomics laboratory, Instituto Italiano di Tecnologia (IIT), Center for Human Technologies (CHT), Genova, Italy (current affiliation)
| | - Javier Rodriguez-Hernaez
- Department of Pathology, New York University Langone Health, New York, NY 10016, USA
- Applied Bioinformatics Laboratory, New York University Langone Health, New York, NY 10016, USA
| | - Andreas Kloetgen
- Department of Pathology, New York University Langone Health, New York, NY 10016, USA
- Applied Bioinformatics Laboratory, New York University Langone Health, New York, NY 10016, USA
| | - Vidur Garg
- Developmental Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Biochemistry Cell and Molecular Biology Program, Weill Cornell Graduate School of Medical Sciences, Cornell University, New York, NY 10065, USA
| | - Erin Char
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Medical College, New York, 10065, New York, USA
| | - Christopher M. Uyehara
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
| | - Ly-sha Ee
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
| | - UkJin Lee
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
| | - Matthias Stadtfeld
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
| | - Anna-Katerina Hadjantonakis
- Developmental Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Biochemistry Cell and Molecular Biology Program, Weill Cornell Graduate School of Medical Sciences, Cornell University, New York, NY 10065, USA
| | - Aristotelis Tsirigos
- Department of Pathology, New York University Langone Health, New York, NY 10016, USA
- Applied Bioinformatics Laboratory, New York University Langone Health, New York, NY 10016, USA
| | - Alexander Polyzos
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
| | - Effie Apostolou
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
| |
Collapse
|
14
|
Zhang Z, Feng F, Qiu Y, Liu J. A generalizable framework to comprehensively predict epigenome, chromatin organization, and transcriptome. Nucleic Acids Res 2023; 51:5931-5947. [PMID: 37224527 PMCID: PMC10325920 DOI: 10.1093/nar/gkad436] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 03/31/2023] [Accepted: 05/09/2023] [Indexed: 05/26/2023] Open
Abstract
Many deep learning approaches have been proposed to predict epigenetic profiles, chromatin organization, and transcription activity. While these approaches achieve satisfactory performance in predicting one modality from another, the learned representations are not generalizable across predictive tasks or across cell types. In this paper, we propose a deep learning approach named EPCOT which employs a pre-training and fine-tuning framework, and is able to accurately and comprehensively predict multiple modalities including epigenome, chromatin organization, transcriptome, and enhancer activity for new cell types, by only requiring cell-type specific chromatin accessibility profiles. Many of these predicted modalities, such as Micro-C and ChIA-PET, are quite expensive to get in practice, and the in silico prediction from EPCOT should be quite helpful. Furthermore, this pre-training and fine-tuning framework allows EPCOT to identify generic representations generalizable across different predictive tasks. Interpreting EPCOT models also provides biological insights including mapping between different genomic modalities, identifying TF sequence binding patterns, and analyzing cell-type specific TF impacts on enhancer activity.
Collapse
Affiliation(s)
- Zhenhao Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 500 S. State St, Ann Arbor, MI 48109, USA
| | - Fan Feng
- Department of Computational Medicine and Bioinformatics, University of Michigan, 500 S. State St, Ann Arbor, MI 48109, USA
| | - Yiyang Qiu
- Department of Computer Science and Engineering, University of Michigan, 500 S. State St, Ann Arbor, MI 48109, USA
| | - Jie Liu
- Department of Computational Medicine and Bioinformatics, University of Michigan, 500 S. State St, Ann Arbor, MI 48109, USA
- Department of Computer Science and Engineering, University of Michigan, 500 S. State St, Ann Arbor, MI 48109, USA
| |
Collapse
|
15
|
Lee D, Yang J, Kim S. Learning the histone codes with large genomic windows and three-dimensional chromatin interactions using transformer. Nat Commun 2022; 13:6678. [PMID: 36335101 PMCID: PMC9637148 DOI: 10.1038/s41467-022-34152-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 10/17/2022] [Indexed: 11/06/2022] Open
Abstract
The quantitative characterization of the transcriptional control by histone modifications has been challenged by many computational studies, but most of them only focus on narrow and linear genomic regions around promoters, leaving a room for improvement. We present Chromoformer, a transformer-based, three-dimensional chromatin conformation-aware deep learning architecture that achieves the state-of-the-art performance in the quantitative deciphering of the histone codes in gene regulation. The core essence of Chromoformer architecture lies in the three variants of attention operation, each specialized to model individual hierarchy of transcriptional regulation involving from core promoters to distal elements in contact with promoters through three-dimensional chromatin interactions. In-depth interpretation of Chromoformer reveals that it adaptively utilizes the long-range dependencies between histone modifications associated with transcription initiation and elongation. We also show that the quantitative kinetics of transcription factories and Polycomb group bodies can be captured by Chromoformer. Together, our study highlights the great advantage of attention-based deep modeling of complex interactions in epigenomes.
Collapse
Affiliation(s)
- Dohoon Lee
- grid.31501.360000 0004 0470 5905Bioinformatics Institute, Seoul National University, Seoul, 08826 Republic of Korea ,grid.31501.360000 0004 0470 5905BK21 FOUR Intelligence Computing, Seoul National University, Seoul, 08826 Republic of Korea
| | - Jeewon Yang
- grid.31501.360000 0004 0470 5905Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, 08826 Republic of Korea
| | - Sun Kim
- grid.31501.360000 0004 0470 5905Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, 08826 Republic of Korea ,grid.31501.360000 0004 0470 5905Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826 Republic of Korea ,grid.31501.360000 0004 0470 5905Department of Computer Science and Engineering, Seoul National University, Seoul, 08826 Republic of Korea ,AIGENDRUG Co., Ltd., Seoul, 08826 Republic of Korea
| |
Collapse
|
16
|
Karbalayghareh A, Sahin M, Leslie CS. Chromatin interaction-aware gene regulatory modeling with graph attention networks. Genome Res 2022; 32:930-944. [PMID: 35396274 PMCID: PMC9104700 DOI: 10.1101/gr.275870.121] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 04/05/2022] [Indexed: 11/24/2022]
Abstract
Linking distal enhancers to genes and modeling their impact on target gene expression are longstanding unresolved problems in regulatory genomics and critical for interpreting noncoding genetic variation. Here, we present a new deep learning approach called GraphReg that exploits 3D interactions from chromosome conformation capture assays to predict gene expression from 1D epigenomic data or genomic DNA sequence. By using graph attention networks to exploit the connectivity of distal elements up to 2 Mb away in the genome, GraphReg more faithfully models gene regulation and more accurately predicts gene expression levels than the state-of-the-art deep learning methods for this task. Feature attribution used with GraphReg accurately identifies functional enhancers of genes, as validated by CRISPRi-FlowFISH and TAP-seq assays, outperforming both convolutional neural networks (CNNs) and the recently proposed activity-by-contact model. Sequence-based GraphReg also accurately predicts direct transcription factor (TF) targets as validated by CRISPRi TF knockout experiments via in silico ablation of TF binding motifs. GraphReg therefore represents an important advance in modeling the regulatory impact of epigenomic and sequence elements.
Collapse
Affiliation(s)
- Alireza Karbalayghareh
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA
| | - Merve Sahin
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA
| | - Christina S Leslie
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA
| |
Collapse
|