1
|
Very N, Boulet C, Gheeraert C, Berthier A, Johanns M, Bou Saleh M, Guille L, Bray F, Strub JM, Bobowski-Gerard M, Zummo FP, Vallez E, Molendi-Coste O, Woitrain E, Cianférani S, Montaigne D, Ntandja-Wandji LC, Dubuquoy L, Dubois-Chevalier J, Staels B, Lefebvre P, Eeckhoute J. O-GlcNAcylation controls pro-fibrotic transcriptional regulatory signaling in myofibroblasts. Cell Death Dis 2024; 15:391. [PMID: 38830870 PMCID: PMC11148087 DOI: 10.1038/s41419-024-06773-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 05/21/2024] [Accepted: 05/23/2024] [Indexed: 06/05/2024]
Abstract
Tissue injury causes activation of mesenchymal lineage cells into wound-repairing myofibroblasts (MFs), whose uncontrolled activity ultimately leads to fibrosis. Although this process is triggered by deep metabolic and transcriptional reprogramming, functional links between these two key events are not yet understood. Here, we report that the metabolic sensor post-translational modification O-linked β-D-N-acetylglucosaminylation (O-GlcNAcylation) is increased and required for myofibroblastic activation. Inhibition of protein O-GlcNAcylation impairs archetypal myofibloblast cellular activities including extracellular matrix gene expression and collagen secretion/deposition as defined in vitro and using ex vivo and in vivo murine liver injury models. Mechanistically, a multi-omics approach combining proteomic, epigenomic, and transcriptomic data mining revealed that O-GlcNAcylation controls the MF transcriptional program by targeting the transcription factors Basonuclin 2 (BNC2) and TEA domain transcription factor 4 (TEAD4) together with the Yes-associated protein 1 (YAP1) co-activator. Indeed, inhibition of protein O-GlcNAcylation impedes their stability leading to decreased functionality of the BNC2/TEAD4/YAP1 complex towards promoting activation of the MF transcriptional regulatory landscape. We found that this involves O-GlcNAcylation of BNC2 at Thr455 and Ser490 and of TEAD4 at Ser69 and Ser99. Altogether, this study unravels protein O-GlcNAcylation as a key determinant of myofibroblastic activation and identifies its inhibition as an avenue to intervene with fibrogenic processes.
Collapse
Affiliation(s)
- Ninon Very
- Univ. Lille, Inserm, CHU Lille, Institut Pasteur de Lille, U1011-EGID, Lille, France
| | - Clémence Boulet
- Univ. Lille, Inserm, CHU Lille, Institut Pasteur de Lille, U1011-EGID, Lille, France
| | - Céline Gheeraert
- Univ. Lille, Inserm, CHU Lille, Institut Pasteur de Lille, U1011-EGID, Lille, France
| | - Alexandre Berthier
- Univ. Lille, Inserm, CHU Lille, Institut Pasteur de Lille, U1011-EGID, Lille, France
| | - Manuel Johanns
- Univ. Lille, Inserm, CHU Lille, Institut Pasteur de Lille, U1011-EGID, Lille, France
| | - Mohamed Bou Saleh
- Univ. Lille, Inserm, CHU Lille, U1286 - INFINITE - Institute for Translational Research in Inflammation, Lille, France
| | - Loïc Guille
- Univ. Lille, Inserm, CHU Lille, Institut Pasteur de Lille, U1011-EGID, Lille, France
| | - Fabrice Bray
- Miniaturization for Synthesis, Analysis & Proteomics, UAR 3290, CNRS, University of Lille, Villeneuve d'Ascq Cedex, France
| | - Jean-Marc Strub
- Laboratoire de Spectrométrie de Masse BioOrganique, CNRS UMR7178, Univ. Strasbourg, IPHC, Infrastructure Nationale de Protéomique ProFI - FR2048, Strasbourg, France
| | - Marie Bobowski-Gerard
- Univ. Lille, Inserm, CHU Lille, Institut Pasteur de Lille, U1011-EGID, Lille, France
| | - Francesco P Zummo
- Univ. Lille, Inserm, CHU Lille, Institut Pasteur de Lille, U1011-EGID, Lille, France
| | - Emmanuelle Vallez
- Univ. Lille, Inserm, CHU Lille, Institut Pasteur de Lille, U1011-EGID, Lille, France
| | - Olivier Molendi-Coste
- Univ. Lille, Inserm, CHU Lille, Institut Pasteur de Lille, U1011-EGID, Lille, France
| | - Eloise Woitrain
- Univ. Lille, Inserm, CHU Lille, Institut Pasteur de Lille, U1011-EGID, Lille, France
| | - Sarah Cianférani
- Laboratoire de Spectrométrie de Masse BioOrganique, CNRS UMR7178, Univ. Strasbourg, IPHC, Infrastructure Nationale de Protéomique ProFI - FR2048, Strasbourg, France
| | - David Montaigne
- Univ. Lille, Inserm, CHU Lille, Institut Pasteur de Lille, U1011-EGID, Lille, France
| | - Line Carolle Ntandja-Wandji
- Univ. Lille, Inserm, CHU Lille, U1286 - INFINITE - Institute for Translational Research in Inflammation, Lille, France
| | - Laurent Dubuquoy
- Univ. Lille, Inserm, CHU Lille, U1286 - INFINITE - Institute for Translational Research in Inflammation, Lille, France
| | | | - Bart Staels
- Univ. Lille, Inserm, CHU Lille, Institut Pasteur de Lille, U1011-EGID, Lille, France
| | - Philippe Lefebvre
- Univ. Lille, Inserm, CHU Lille, Institut Pasteur de Lille, U1011-EGID, Lille, France
| | - Jérôme Eeckhoute
- Univ. Lille, Inserm, CHU Lille, Institut Pasteur de Lille, U1011-EGID, Lille, France.
| |
Collapse
|
2
|
Feng X, Liu S, Li K, Bu F, Yuan H. NCAD v1.0: a database for non-coding variant annotation and interpretation. J Genet Genomics 2024; 51:230-242. [PMID: 38142743 DOI: 10.1016/j.jgg.2023.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 12/15/2023] [Accepted: 12/18/2023] [Indexed: 12/26/2023]
Abstract
The application of whole genome sequencing is expanding in clinical diagnostics across various genetic disorders, and the significance of non-coding variants in penetrant diseases is increasingly being demonstrated. Therefore, it is urgent to improve the diagnostic yield by exploring the pathogenic mechanisms of variants in non-coding regions. However, the interpretation of non-coding variants remains a significant challenge, due to the complex functional regulatory mechanisms of non-coding regions and the current limitations of available databases and tools. Hence, we develop the non-coding variant annotation database (NCAD, http://www.ncawdb.net/), encompassing comprehensive insights into 665,679,194 variants, regulatory elements, and element interaction details. Integrating data from 96 sources, spanning both GRCh37 and GRCh38 versions, NCAD v1.0 provides vital information to support the genetic diagnosis of non-coding variants, including allele frequencies of 12 diverse populations, with a particular focus on the population frequency information for 230,235,698 variants in 20,964 Chinese individuals. Moreover, it offers prediction scores for variant functionality, five categories of regulatory elements, and four types of non-coding RNAs. With its rich data and comprehensive coverage, NCAD serves as a valuable platform, empowering researchers and clinicians with profound insights into non-coding regulatory mechanisms while facilitating the interpretation of non-coding variants.
Collapse
Affiliation(s)
- Xiaoshu Feng
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China
| | - Sihan Liu
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China
| | - Ke Li
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China
| | - Fengxiao Bu
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China.
| | - Huijun Yuan
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China.
| |
Collapse
|
3
|
Liu Z, Wong HM, Chen X, Lin J, Zhang S, Yan S, Wang F, Li X, Wong KC. MotifHub: Detection of trans-acting DNA motif group with probabilistic modeling algorithm. Comput Biol Med 2024; 168:107753. [PMID: 38039889 DOI: 10.1016/j.compbiomed.2023.107753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 10/30/2023] [Accepted: 11/20/2023] [Indexed: 12/03/2023]
Abstract
BACKGROUND Trans-acting factors are of special importance in transcription regulation, which is a group of proteins that can directly or indirectly recognize or bind to the 8-12 bp core sequence of cis-acting elements and regulate the transcription efficiency of target genes. The progressive development in high-throughput chromatin capture technology (e.g., Hi-C) enables the identification of chromatin-interacting sequence groups where trans-acting DNA motif groups can be discovered. The problem difficulty lies in the combinatorial nature of DNA sequence pattern matching and its underlying sequence pattern search space. METHOD Here, we propose to develop MotifHub for trans-acting DNA motif group discovery on grouped sequences. Specifically, the main approach is to develop probabilistic modeling for accommodating the stochastic nature of DNA motif patterns. RESULTS Based on the modeling, we develop global sampling techniques based on EM and Gibbs sampling to address the global optimization challenge for model fitting with latent variables. The results reflect that our proposed approaches demonstrate promising performance with linear time complexities. CONCLUSION MotifHub is a novel algorithm considering the identification of both DNA co-binding motif groups and trans-acting TFs. Our study paves the way for identifying hub TFs of stem cell development (OCT4 and SOX2) and determining potential therapeutic targets of prostate cancer (FOXA1 and MYC). To ensure scientific reproducibility and long-term impact, its matrix-algebra-optimized source code is released at http://bioinfo.cs.cityu.edu.hk/MotifHub.
Collapse
Affiliation(s)
- Zhe Liu
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong, China
| | - Hiu-Man Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong, China
| | - Xingjian Chen
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong, China
| | - Jiecong Lin
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong, China
| | - Shixiong Zhang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong, China
| | - Shankai Yan
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong, China
| | - Fuzhou Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong, China
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong, China.
| |
Collapse
|
4
|
Li Y, Ju F, Chen Z, Qu Y, Xia H, He L, Wu L, Zhu J, Shao B, Deng P. CREaTor: zero-shot cis-regulatory pattern modeling with attention mechanisms. Genome Biol 2023; 24:266. [PMID: 37996959 PMCID: PMC10666311 DOI: 10.1186/s13059-023-03103-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 11/03/2023] [Indexed: 11/25/2023] Open
Abstract
Linking cis-regulatory sequences to target genes has been a long-standing challenge. In this study, we introduce CREaTor, an attention-based deep neural network designed to model cis-regulatory patterns for genomic elements up to 2 Mb from target genes. Coupled with a training strategy that predicts gene expression from flanking candidate cis-regulatory elements (cCREs), CREaTor can model cell type-specific cis-regulatory patterns in new cell types without prior knowledge of cCRE-gene interactions or additional training. The zero-shot modeling capability, combined with the use of only RNA-seq and ChIP-seq data, allows for the ready generalization of CREaTor to a broad range of cell types.
Collapse
Affiliation(s)
- Yongge Li
- Microsoft Research AI4Science, Beijing, China
- School of Medicine, Tsinghua University, Beijing, China
| | - Fusong Ju
- Microsoft Research AI4Science, Beijing, China
| | - Zhiyuan Chen
- Microsoft Research AI4Science, Beijing, China
- School of Computing, Australian National University, Canberra, Australia
| | - Yiming Qu
- Microsoft Research AI4Science, Beijing, China
- School of Life Sciences, Tsinghua University, Beijing, China
| | | | - Liang He
- Microsoft Research AI4Science, Beijing, China
| | - Lijun Wu
- Microsoft Research AI4Science, Beijing, China
| | - Jianwei Zhu
- Microsoft Research AI4Science, Beijing, China
| | - Bin Shao
- Microsoft Research AI4Science, Beijing, China
| | - Pan Deng
- Microsoft Research AI4Science, Beijing, China.
| |
Collapse
|
5
|
Wang Q, Zhang J, Liu Z, Duan Y, Li C. Integrative approaches based on genomic techniques in the functional studies on enhancers. Brief Bioinform 2023; 25:bbad442. [PMID: 38048082 PMCID: PMC10694556 DOI: 10.1093/bib/bbad442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 10/22/2023] [Accepted: 11/08/2023] [Indexed: 12/05/2023] Open
Abstract
With the development of sequencing technology and the dramatic drop in sequencing cost, the functions of noncoding genes are being characterized in a wide variety of fields (e.g. biomedicine). Enhancers are noncoding DNA elements with vital transcription regulation functions. Tens of thousands of enhancers have been identified in the human genome; however, the location, function, target genes and regulatory mechanisms of most enhancers have not been elucidated thus far. As high-throughput sequencing techniques have leapt forwards, omics approaches have been extensively employed in enhancer research. Multidimensional genomic data integration enables the full exploration of the data and provides novel perspectives for screening, identification and characterization of the function and regulatory mechanisms of unknown enhancers. However, multidimensional genomic data are still difficult to integrate genome wide due to complex varieties, massive amounts, high rarity, etc. To facilitate the appropriate methods for studying enhancers with high efficacy, we delineate the principles, data processing modes and progress of various omics approaches to study enhancers and summarize the applications of traditional machine learning and deep learning in multi-omics integration in the enhancer field. In addition, the challenges encountered during the integration of multiple omics data are addressed. Overall, this review provides a comprehensive foundation for enhancer analysis.
Collapse
Affiliation(s)
- Qilin Wang
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Junyou Zhang
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Zhaoshuo Liu
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Yingying Duan
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Chunyan Li
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
- Key Laboratory of Big Data-Based Precision Medicine (Ministry of Industry and Information Technology), Beihang University, Beijing 100191, China
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Beihang University, Beijing 100191, China
| |
Collapse
|
6
|
Umarov R, Hon CC. Enhancer target prediction: state-of-the-art approaches and future prospects. Biochem Soc Trans 2023; 51:1975-1988. [PMID: 37830459 DOI: 10.1042/bst20230917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 10/02/2023] [Accepted: 10/02/2023] [Indexed: 10/14/2023]
Abstract
Enhancers are genomic regions that regulate gene transcription and are located far away from the transcription start sites of their target genes. Enhancers are highly enriched in disease-associated variants and thus deciphering the interactions between enhancers and genes is crucial to understanding the molecular basis of genetic predispositions to diseases. Experimental validations of enhancer targets can be laborious. Computational methods have thus emerged as a valuable alternative for studying enhancer-gene interactions. A variety of computational methods have been developed to predict enhancer targets by incorporating genomic features (e.g. conservation, distance, and sequence), epigenomic features (e.g. histone marks and chromatin contacts) and activity measurements (e.g. covariations of enhancer activity and gene expression). With the recent advances in genome perturbation and chromatin conformation capture technologies, data on experimentally validated enhancer targets are becoming available for supervised training of these methods and evaluation of their performance. In this review, we categorize enhancer target prediction methods based on their rationales and approaches. Then we discuss their merits and limitations and highlight the future directions for enhancer targets prediction.
Collapse
Affiliation(s)
- Ramzan Umarov
- RIKEN Centre for Integrative Medical Sciences, Yokohama RIKEN Institute, Yokohama, Japan
| | - Chung-Chau Hon
- RIKEN Centre for Integrative Medical Sciences, Yokohama RIKEN Institute, Yokohama, Japan
| |
Collapse
|
7
|
McManus JN, Lovelett RJ, Lowengrub D, Christensen S. A unifying statistical framework to discover disease genes from GWASs. CELL GENOMICS 2023; 3:100264. [PMID: 36950381 PMCID: PMC10025450 DOI: 10.1016/j.xgen.2023.100264] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 09/07/2022] [Accepted: 01/19/2023] [Indexed: 03/10/2023]
Abstract
Genome-wide association studies (GWASs) identify genomic loci associated with complex traits, but it remains a challenge to identify the genes affected by causal genetic variants in these loci. Attempts to solve this challenge are frustrated by a number of compounding problems. Here, we show how to combine solutions to these problems into a unified mathematical framework. From this synthesis, it becomes possible to compute the probability that each gene in the genome is affected by a causal variant, given a particular trait, without making assumptions about the relevant cell types or tissues. We validate each component of the framework individually and in combination. When applied to large GWASs of human disease, the resulting paradigm can rediscover the majority of well-known disease genes. Moreover, it establishes human genetics support for many genes previously implicated only by clinical or preclinical evidence, and it uncovers a plethora of novel disease genes with compelling biological rationale.
Collapse
Affiliation(s)
- Justin N.J. McManus
- Kallyope, Inc., 430 East 29th Street, New York, NY 10016, USA
- Corresponding author
| | | | | | | |
Collapse
|
8
|
Kaur A, Chauhan APS, Aggarwal AK. Prediction of Enhancers in DNA Sequence Data using a Hybrid CNN-DLSTM Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1327-1336. [PMID: 35417351 DOI: 10.1109/tcbb.2022.3167090] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Enhancer, a distal cis-regulatory element controls gene expression. Experimental prediction of enhancer elements is time-consuming and expensive. Consequently, various inexpensive deep learning-based fast methods have been developed for predicting the enhancers and determining their strength. In this paper, we have proposed a two-stage deep learning-based framework leveraging DNA structural features, natural language processing, convolutional neural network, and long short-term memory to predict the enhancer elements accurately in the genomics data. In the first stage, we extracted the features from DNA sequence data by using three feature representation techniques viz., k-mer based feature extraction along with word2vector based interpretation of underlined patterns, one-hot encoding, and the DNAshape technique. In the second stage, strength of enhancers is predicted from the extracted features using a hybrid deep learning model. The method is capable of adapting itself to varying sizes of datasets. Also, as proposed model can capture long-range sequencing patterns, the robustness of the method remains unaffected against minor variations in the genomics sequence. The method outperforms the other state-of-the-art methods at both stages in terms of performance metrics of prediction accuracy, specificity, Mathew's correlation coefficient, and area under the ROC curve. In summary, the proposed method is a reliable method for enhancer prediction.
Collapse
|
9
|
Hoellinger T, Mestre C, Aschard H, Le Goff W, Foissac S, Faraut T, Djebali S. Enhancer/gene relationships: Need for more reliable genome-wide reference sets. FRONTIERS IN BIOINFORMATICS 2023; 3:1092853. [PMID: 36909938 PMCID: PMC9999192 DOI: 10.3389/fbinf.2023.1092853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 02/07/2023] [Indexed: 02/26/2023] Open
Abstract
Differences in cells' functions arise from differential activity of regulatory elements, including enhancers. Enhancers are cis-regulatory elements that cooperate with promoters through transcription factors to activate the expression of one or several genes by getting physically close to them in the 3D space of the nucleus. There is increasing evidence that genetic variants associated with common diseases are enriched in enhancers active in cell types relevant to these diseases. Identifying the enhancers associated with genes and conversely, the sets of genes activated by each enhancer (the so-called enhancer/gene or E/G relationships) across cell types, can help understanding the genetic mechanisms underlying human diseases. There are three broad approaches for the genome-wide identification of E/G relationships in a cell type: 1) genetic link methods or eQTL, 2) functional link methods based on 1D functional data such as open chromatin, histone mark or gene expression and 3) spatial link methods based on 3D data such as HiC. Since 1) and 3) are costly, the current strategy is to develop functional link methods and to use data from 1) and 3) as reference to evaluate them. However, there is still no consensus on the best functional link method to date, and method comparison remain seldom. Here, we compared the relative performances of three recent methods for the identification of enhancer-gene links, TargetFinder, Average-Rank, and the ABC model, using the three latest benchmarks from the field: a reference that combines 3D and eQTL data, called BENGI, and two genetic screening references, called CRiFF and CRiSPRi. Overall, none of the three methods performed best on the three references. CRiFF and CRISPRi reference sets are likely more reliable, but CRiFF is not genome-wide and CRiFF and CRISPRi are mostly available on the K562 cancer cell line. The BENGI reference set is genome-wide but likely contains many false positives. This study therefore calls for new reliable and genome-wide E/G reference data rather than new functional link E/G identification methods.
Collapse
Affiliation(s)
- Tristan Hoellinger
- IRSD, Université de Toulouse, INSERM, INRAE, ENVT, Univ Toulouse III - Paul Sabatier (UPS), Toulouse, France.,INSA Toulouse, INP-ENSEEIHT, Toulouse, France
| | - Camille Mestre
- GenPhySE, Université de Toulouse, INRAE, INPT, ENVT, Toulouse, France
| | - Hugues Aschard
- Institut Pasteur, Université Paris Cité, Department of Computational Biology, Paris, France.,Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| | - Wilfried Le Goff
- Sorbonne Université, INSERM, Institute of Cardiometabolism and Nutrition (ICAN), UMR_S1166, Paris, France
| | - Sylvain Foissac
- GenPhySE, Université de Toulouse, INRAE, INPT, ENVT, Toulouse, France
| | - Thomas Faraut
- GenPhySE, Université de Toulouse, INRAE, INPT, ENVT, Toulouse, France
| | - Sarah Djebali
- IRSD, Université de Toulouse, INSERM, INRAE, ENVT, Univ Toulouse III - Paul Sabatier (UPS), Toulouse, France.,GenPhySE, Université de Toulouse, INRAE, INPT, ENVT, Toulouse, France
| |
Collapse
|
10
|
Synthesizing genome regulation data with vote-counting. Trends Genet 2022; 38:1208-1216. [PMID: 35817619 DOI: 10.1016/j.tig.2022.06.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 05/31/2022] [Accepted: 06/16/2022] [Indexed: 01/24/2023]
Abstract
The increasing availability of high-throughput datasets allows amalgamating research information across a large body of genome regulation studies. Given the recent success of meta-analyses on transcriptional regulators, epigenetic marks, and enhancer:gene associations, we expect that such surveys will continue to provide novel and reproducible insights. However, meta-analyses are severely hampered by the diversity of available data, concurring protocols, an eclectic amount of bioinformatics tools, and myriads of conceivable parameter combinations. Such factors can easily bar life scientists from synthesizing omics data and substantially curb their interpretability. Despite statistical challenges of the method, we would like to emphasize the advantages of joining data from different sources through vote-counting and showcase examples that achieve a simple but highly intuitive data integration.
Collapse
|
11
|
Koido M, Hon CC, Koyama S, Kawaji H, Murakawa Y, Ishigaki K, Ito K, Sese J, Parrish NF, Kamatani Y, Carninci P, Terao C. Prediction of the cell-type-specific transcription of non-coding RNAs from genome sequences via machine learning. Nat Biomed Eng 2022:10.1038/s41551-022-00961-8. [PMID: 36411359 DOI: 10.1038/s41551-022-00961-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Accepted: 10/12/2022] [Indexed: 11/22/2022]
Abstract
Gene transcription is regulated through complex mechanisms involving non-coding RNAs (ncRNAs). As the transcription of ncRNAs, especially of enhancer RNAs, is often low and cell type specific, how the levels of RNA transcription depend on genotype remains largely unexplored. Here we report the development and utility of a machine-learning model (MENTR) that reliably links genome sequence and ncRNA expression at the cell type level. Effects on ncRNA transcription predicted by the model were concordant with estimates from published studies in a cell-type-dependent manner, regardless of allele frequency and genetic linkage. Among 41,223 variants from genome-wide association studies, the model identified 7,775 enhancer RNAs and 3,548 long ncRNAs causally associated with complex traits across 348 major human primary cells and tissues, such as rare variants plausibly altering the transcription of enhancer RNAs to influence the risks of Crohn's disease and asthma. The model may aid the discovery of causal variants and the generation of testable hypotheses for biological mechanisms driving complex traits.
Collapse
Affiliation(s)
- Masaru Koido
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.,Division of Molecular Pathology, Department of Cancer Biology, Institute of Medical Science, The University of Tokyo, Tokyo, Japan.,Laboratory of Complex Trait Genomics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Chung-Chau Hon
- Laboratory for Genome Information Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Satoshi Koyama
- Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Hideya Kawaji
- Preventive Medicine and Applied Genomics Unit, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.,Research Center for Genome & Medical Sciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
| | - Yasuhiro Murakawa
- RIKEN-IFOM Joint Laboratory for Cancer Genomics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.,IFOM ETS - The AIRC Institute of Molecular Oncology, Milan, Italy.,Institute for the Advanced Study of Human Biology, Kyoto University, Kyoto, Japan
| | - Kazuyoshi Ishigaki
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.,Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.,Center for Data Sciences, Harvard Medical School, Boston, MA, USA.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Kaoru Ito
- Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Jun Sese
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, Aomi, Koto-ku, Tokyo, Japan.,Humanome Lab Inc., Tokyo, Japan
| | - Nicholas F Parrish
- Genome Immunobiology RIKEN Hakubi Research Team, RIKEN Cluster for Pioneering Research and RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Yoichiro Kamatani
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.,Laboratory of Complex Trait Genomics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Piero Carninci
- Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.,Laboratory for Single Cell Technologies, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.,Human Technopole, Milan, Italy
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan. .,Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan. .,The Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan.
| |
Collapse
|
12
|
Functional genomics uncovers the transcription factor BNC2 as required for myofibroblastic activation in fibrosis. Nat Commun 2022; 13:5324. [PMID: 36088459 PMCID: PMC9464213 DOI: 10.1038/s41467-022-33063-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 08/31/2022] [Indexed: 11/21/2022] Open
Abstract
Tissue injury triggers activation of mesenchymal lineage cells into wound-repairing myofibroblasts, whose unrestrained activity leads to fibrosis. Although this process is largely controlled at the transcriptional level, whether the main transcription factors involved have all been identified has remained elusive. Here, we report multi-omics analyses unraveling Basonuclin 2 (BNC2) as a myofibroblast identity transcription factor. Using liver fibrosis as a model for in-depth investigations, we first show that BNC2 expression is induced in both mouse and human fibrotic livers from different etiologies and decreases upon human liver fibrosis regression. Importantly, we found that BNC2 transcriptional induction is a specific feature of myofibroblastic activation in fibrotic tissues. Mechanistically, BNC2 expression and activities allow to integrate pro-fibrotic stimuli, including TGFβ and Hippo/YAP1 signaling, towards induction of matrisome genes such as those encoding type I collagen. As a consequence, Bnc2 deficiency blunts collagen deposition in livers of mice fed a fibrogenic diet. Additionally, our work establishes BNC2 as potentially druggable since we identified the thalidomide derivative CC-885 as a BNC2 inhibitor. Altogether, we propose that BNC2 is a transcription factor involved in canonical pathways driving myofibroblastic activation in fibrosis. Myofibroblasts contribute to the development of liver fibrosis. Here, the authors report that the transcription factor Basonuclin 2 (BNC2) integrates fibrogenic signals and drives myofibroblastic transcriptional activation in liver fibrosis.
Collapse
|
13
|
Zhang L, Zhang J, Nie Q. DIRECT-NET: An efficient method to discover cis-regulatory elements and construct regulatory networks from single-cell multiomics data. SCIENCE ADVANCES 2022; 8:eabl7393. [PMID: 35648859 PMCID: PMC9159696 DOI: 10.1126/sciadv.abl7393] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
The emergence of single-cell multiomics data provides unprecedented opportunities to scrutinize the transcriptional regulatory mechanisms controlling cell identity. However, how to use those datasets to dissect the cis-regulatory element (CRE)–to–gene relationships at a single-cell level remains a major challenge. Here, we present DIRECT-NET, a machine-learning method based on gradient boosting, to identify genome-wide CREs and their relationship to target genes, either from parallel single-cell gene expression and chromatin accessibility data or from single-cell chromatin accessibility data alone. By extensively evaluating and characterizing DIRECT-NET’s predicted CREs using independent functional genomics data, we find that DIRECT-NET substantially improves the accuracy of inferring CRE-to-gene relationships in comparison to existing methods. DIRECT-NET is also capable of revealing cell subpopulation–specific and dynamic regulatory linkages. Overall, DIRECT-NET provides an efficient tool for predicting transcriptional regulation codes from single-cell multiomics data.
Collapse
Affiliation(s)
- Lihua Zhang
- School of Computer Science, Wuhan University, Wuhan 430072, China
- Department of Mathematics, University of California, Irvine, Irvine, CA 92697, USA
- NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, Irvine, CA 92697, USA
| | - Jing Zhang
- Department of Computer Science, University of California, Irvine, Irvine, CA 92697, USA
- Corresponding author. (J.Z.); (Q.N.)
| | - Qing Nie
- Department of Mathematics, University of California, Irvine, Irvine, CA 92697, USA
- NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, Irvine, CA 92697, USA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA 92697, USA
- Corresponding author. (J.Z.); (Q.N.)
| |
Collapse
|
14
|
Mulero Hernández J, Fernández-Breis JT. Analysis of the landscape of human enhancer sequences in biological databases. Comput Struct Biotechnol J 2022; 20:2728-2744. [PMID: 35685360 PMCID: PMC9168495 DOI: 10.1016/j.csbj.2022.05.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 05/20/2022] [Accepted: 05/21/2022] [Indexed: 12/01/2022] Open
Abstract
The process of gene regulation extends as a network in which both genetic sequences and proteins are involved. The levels of regulation and the mechanisms involved are multiple. Transcription is the main control mechanism for most genes, being the downstream steps responsible for refining the transcription patterns. In turn, gene transcription is mainly controlled by regulatory events that occur at promoters and enhancers. Several studies are focused on analyzing the contribution of enhancers in the development of diseases and their possible use as therapeutic targets. The study of regulatory elements has advanced rapidly in recent years with the development and use of next generation sequencing techniques. All this information has generated a large volume of information that has been transferred to a growing number of public repositories that store this information. In this article, we analyze the content of those public repositories that contain information about human enhancers with the aim of detecting whether the knowledge generated by scientific research is contained in those databases in a way that could be computationally exploited. The analysis will be based on three main aspects identified in the literature: types of enhancers, type of evidence about the enhancers, and methods for detecting enhancer-promoter interactions. Our results show that no single database facilitates the optimal exploitation of enhancer data, most types of enhancers are not represented in the databases and there is need for a standardized model for enhancers. We have identified major gaps and challenges for the computational exploitation of enhancer data.
Collapse
Affiliation(s)
- Juan Mulero Hernández
- Dept. Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, IMIB-Arrixaca, Spain
| | | |
Collapse
|
15
|
Classification of non-coding variants with high pathogenic impact. PLoS Genet 2022; 18:e1010191. [PMID: 35486646 PMCID: PMC9094564 DOI: 10.1371/journal.pgen.1010191] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2021] [Revised: 05/11/2022] [Accepted: 04/05/2022] [Indexed: 01/22/2023] Open
Abstract
Whole genome sequencing is increasingly used to diagnose medical conditions of genetic origin. While both coding and non-coding DNA variants contribute to a wide range of diseases, most patients who receive a WGS-based diagnosis today harbour a protein-coding mutation. Functional interpretation and prioritization of non-coding variants represents a persistent challenge, and disease-causing non-coding variants remain largely unidentified. Depending on the disease, WGS fails to identify a candidate variant in 20–80% of patients, severely limiting the usefulness of sequencing for personalised medicine. Here we present FINSURF, a machine-learning approach to predict the functional impact of non-coding variants in regulatory regions. FINSURF outperforms state-of-the-art methods, owing in particular to optimized control variants selection during training. In addition to ranking candidate variants, FINSURF breaks down the score for each variant into contributions from individual annotations, facilitating the evaluation of their functional relevance. We applied FINSURF to a diverse set of 30 diseases with described causative non-coding mutations, and correctly identified the disease-causative non-coding variant within the ten top hits in 22 cases. FINSURF is implemented as an online server to as well as custom browser tracks, and provides a quick and efficient solution to prioritize candidate non-coding variants in realistic clinical settings.
Collapse
|
16
|
Qin T, Lee C, Li S, Cavalcante RG, Orchard P, Yao H, Zhang H, Wang S, Patil S, Boyle AP, Sartor MA. Comprehensive enhancer-target gene assignments improve gene set level interpretation of genome-wide regulatory data. Genome Biol 2022; 23:105. [PMID: 35473573 PMCID: PMC9044877 DOI: 10.1186/s13059-022-02668-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 04/06/2022] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Revealing the gene targets of distal regulatory elements is challenging yet critical for interpreting regulome data. Experiment-derived enhancer-gene links are restricted to a small set of enhancers and/or cell types, while the accuracy of genome-wide approaches remains elusive due to the lack of a systematic evaluation. We combined multiple spatial and in silico approaches for defining enhancer locations and linking them to their target genes aggregated across >500 cell types, generating 1860 human genome-wide distal enhancer-to-target gene definitions (EnTDefs). To evaluate performance, we used gene set enrichment (GSE) testing on 87 independent ENCODE ChIP-seq datasets of 34 transcription factors (TFs) and assessed concordance of results with known TF Gene Ontology annotations, and other benchmarks. RESULTS The top ranked 741 (40%) EnTDefs significantly outperform the common, naïve approach of linking distal regions to the nearest genes, and the top 10 EnTDefs perform well when applied to ChIP-seq data of other cell types. The GSE-based ranking of EnTDefs is highly concordant with ranking based on overlap with curated benchmarks of enhancer-gene interactions. Both our top general EnTDef and cell-type-specific EnTDefs significantly outperform seven independent computational and experiment-based enhancer-gene pair datasets. We show that using our top EnTDefs for GSE with either genome-wide DNA methylation or ATAC-seq data is able to better recapitulate the biological processes changed in gene expression data performed in parallel for the same experiment than our lower-ranked EnTDefs. CONCLUSIONS Our findings illustrate the power of our approach to provide genome-wide interpretation regardless of cell type.
Collapse
Affiliation(s)
- Tingting Qin
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA.
| | - Christopher Lee
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
- Department of Biostatistics, School of Public Health, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Shiting Li
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Raymond G Cavalcante
- Biomedical Research Core Facilities, Epigenomics Core, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Peter Orchard
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Heming Yao
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Hanrui Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Shuze Wang
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Snehal Patil
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Alan P Boyle
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Maureen A Sartor
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA.
- Department of Biostatistics, School of Public Health, University of Michigan Medical School, Ann Arbor, MI, USA.
| |
Collapse
|
17
|
A framework to score the effects of structural variants in health and disease. Genome Res 2022; 32:766-777. [PMID: 35197310 PMCID: PMC8997355 DOI: 10.1101/gr.275995.121] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 02/22/2022] [Indexed: 11/25/2022]
Abstract
While technological advances improved the identification of structural variants (SVs) in the human genome, their interpretation remains challenging. Several methods utilize individual mechanistic principles like the deletion of coding sequence or 3D genome architecture disruptions. However, a comprehensive tool using the broad spectrum of available annotations is missing. Here, we describe CADD-SV, a method to retrieve and integrate a wide set of annotations to predict the effects of SVs. Previously, supervised learning approaches were limited due to a small number and biased set of annotated pathogenic or benign SVs. We overcome this problem by using a surrogate training-objective, the Combined Annotation Dependent Depletion (CADD) of functional variants. We use human and chimpanzee derived SVs as proxy-neutral and contrast them with matched simulated variants as proxy-deleterious, an approach that has proven powerful for short sequence variants. Our tool computes summary statistics over diverse variant annotations and uses random forest models to prioritize deleterious structural variants. The resulting CADD-SV scores correlate with known pathogenic and rare population variants. We further show that we can prioritize somatic cancer variants as well as noncoding variants known to affect gene expression. We provide a website and offline-scoring tool for easy application of CADD-SV.
Collapse
|
18
|
Hait TA, Elkon R, Shamir R. CT-FOCS: a novel method for inferring cell type-specific enhancer–promoter maps. Nucleic Acids Res 2022; 50:e55. [PMID: 35100425 PMCID: PMC9178001 DOI: 10.1093/nar/gkac048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 01/09/2022] [Accepted: 01/15/2022] [Indexed: 11/13/2022] Open
Abstract
Spatiotemporal gene expression patterns are governed to a large extent by the activity of enhancer elements, which engage in physical contacts with their target genes. Identification of enhancer–promoter (EP) links that are functional only in a specific subset of cell types is a key challenge in understanding gene regulation. We introduce CT-FOCS (cell type FOCS), a statistical inference method that uses linear mixed effect models to infer EP links that show marked activity only in a single or a small subset of cell types out of a large panel of probed cell types. Analyzing 808 samples from FANTOM5, covering 472 cell lines, primary cells and tissues, CT-FOCS inferred such EP links more accurately than recent state-of-the-art methods. Furthermore, we show that strictly cell type-specific EP links are very uncommon in the human genome.
Collapse
Affiliation(s)
- Tom Aharon Hait
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Ran Elkon
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv 69978, Israel
| | - Ron Shamir
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
19
|
Shen Y, Zhong Q, Liu T, Wen Z, Shen W, Li L. CharID: a two-step model for universal prediction of interactions between chromatin accessible regions. Brief Bioinform 2022; 23:6514800. [PMID: 35077535 DOI: 10.1093/bib/bbab602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 12/23/2021] [Accepted: 12/24/2021] [Indexed: 11/14/2022] Open
Abstract
Open chromatin regions (OCRs) allow direct interaction between cis-regulatory elements and trans-acting factors. Therefore, predicting all potential OCR-mediated loops is essential for deciphering the regulation mechanism of gene expression. However, existing loop prediction tools are restricted to specific anchor types. Here, we present CharID (Chromatin Accessible Region Interaction Detector), a two-step model that combines neural network and ensemble learning to predict OCR-mediated loops. In the first step, CharID-Anchor, an attention-based hybrid CNN-BiGRU network is constructed to discriminate between the anchor and nonanchor OCRs. In the second step, CharID-Loop uses gradient boosting decision tree with chromosome-split strategy to predict the interactions between anchor OCRs. The performance was assessed in three human cell lines, and CharID showed superior prediction performance compared with other algorithms. In contrast to the methods designed to predict a particular type of loops, CharID can detect varieties of chromatin loops not limited to enhancer-promoter loops or architectural protein-mediated loops. We constructed the OCR-mediated interaction network using the predicted loops and identified hub anchors, which are highlighted by their proximity to housekeeping genes. By analyzing loops containing SNPs associated with cardiovascular disease, we identified an SNP-gene loop indicating the regulation mechanism of the GFOD1. Taken together, CharID universally predicts diverse chromatin loops beyond other state-of-the-art methods, which are limited by anchor types, and experimental techniques, which are limited by sensitivities drastically decaying with the genomic distance of anchors. Finally, we hosted Peaksniffer, a user-friendly web server that provides online prediction, query and visualization of OCRs and associated loops.
Collapse
Affiliation(s)
- Yin Shen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P. R. China
- 3D Genomics Research Center, Huazhong Agricultural University, Wuhan, 430070, P. R. China
| | - Quan Zhong
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P. R. China
- 3D Genomics Research Center, Huazhong Agricultural University, Wuhan, 430070, P. R. China
| | - Tian Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P. R. China
| | - Zi Wen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P. R. China
- 3D Genomics Research Center, Huazhong Agricultural University, Wuhan, 430070, P. R. China
| | - Wei Shen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P. R. China
- 3D Genomics Research Center, Huazhong Agricultural University, Wuhan, 430070, P. R. China
| | - Li Li
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P. R. China
- 3D Genomics Research Center, Huazhong Agricultural University, Wuhan, 430070, P. R. China
| |
Collapse
|
20
|
Giacopuzzi E, Popitsch N, Taylor JC. OUP accepted manuscript. Nucleic Acids Res 2022; 50:2522-2535. [PMID: 35234913 PMCID: PMC8934622 DOI: 10.1093/nar/gkac130] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 02/02/2022] [Accepted: 02/14/2022] [Indexed: 11/25/2022] Open
Abstract
Non-coding variants have long been recognized as important contributors to common disease risks, but with the expansion of clinical whole genome sequencing, examples of rare, high-impact non-coding variants are also accumulating. Despite recent advances in the study of regulatory elements and the availability of specialized data collections, the systematic annotation of non-coding variants from genome sequencing remains challenging. Here, we propose a new framework for the prioritization of non-coding regulatory variants that integrates information about regulatory regions with prediction scores and HPO-based prioritization. Firstly, we created a comprehensive collection of annotations for regulatory regions including a database of 2.4 million regulatory elements (GREEN-DB) annotated with controlled gene(s), tissue(s) and associated phenotype(s) where available. Secondly, we calculated a variation constraint metric and showed that constrained regulatory regions associate with disease-associated genes and essential genes from mouse knock-outs. Thirdly, we compared 19 non-coding impact prediction scores providing suggestions for variant prioritization. Finally, we developed a VCF annotation tool (GREEN-VARAN) that can integrate all these elements to annotate variants for their potential regulatory impact. In our evaluation, we show that GREEN-DB can capture previously published disease-associated non-coding variants as well as identify additional candidate disease genes in trio analyses.
Collapse
Affiliation(s)
- Edoardo Giacopuzzi
- Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
- National Institute for Health Research Oxford Biomedical Research Centre, Oxford OX4 2PG, UK
| | - Niko Popitsch
- Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
- Max Perutz Labs, University of Vienna, Dr. Bohr-Gasse 9, 1030 Vienna, Austria
| | - Jenny C Taylor
- To whom correspondence should be addressed. Tel: +44 01865 287631;
| |
Collapse
|
21
|
Laverre A, Tannier E, Necsulea A. Long-range promoter-enhancer contacts are conserved during evolution and contribute to gene expression robustness. Genome Res 2021; 32:280-296. [PMID: 34930799 PMCID: PMC8805723 DOI: 10.1101/gr.275901.121] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 12/16/2021] [Indexed: 11/25/2022]
Abstract
Gene expression is regulated through complex molecular interactions, involving cis-acting elements that can be situated far away from their target genes. Data on long-range contacts between promoters and regulatory elements are rapidly accumulating. However, it remains unclear how these regulatory relationships evolve and how they contribute to the establishment of robust gene expression profiles. Here, we address these questions by comparing genome-wide maps of promoter-centered chromatin contacts in mouse and human. We show that there is significant evolutionary conservation of cis-regulatory landscapes, indicating that selective pressures act to preserve not only regulatory element sequences but also their chromatin contacts with target genes. The extent of evolutionary conservation is remarkable for long-range promoter–enhancer contacts, illustrating how the structure of regulatory landscapes constrains large-scale genome evolution. We show that the evolution of cis-regulatory landscapes, measured in terms of distal element sequences, synteny, or contacts with target genes, is significantly associated with gene expression evolution.
Collapse
Affiliation(s)
- Alexandre Laverre
- Université de Lyon, Université Claude Bernard Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive
| | - Eric Tannier
- Université de Lyon, Université Claude Bernard Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive, Centre de recherche Inria de Lyon
| | | |
Collapse
|
22
|
Yousefi S, Deng R, Lanko K, Salsench EM, Nikoncuk A, van der Linde HC, Perenthaler E, van Ham TJ, Mulugeta E, Barakat TS. Comprehensive multi-omics integration identifies differentially active enhancers during human brain development with clinical relevance. Genome Med 2021; 13:162. [PMID: 34663447 PMCID: PMC8524963 DOI: 10.1186/s13073-021-00980-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 09/29/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Non-coding regulatory elements (NCREs), such as enhancers, play a crucial role in gene regulation, and genetic aberrations in NCREs can lead to human disease, including brain disorders. The human brain is a complex organ that is susceptible to numerous disorders; many of these are caused by genetic changes, but a multitude remain currently unexplained. Understanding NCREs acting during brain development has the potential to shed light on previously unrecognized genetic causes of human brain disease. Despite immense community-wide efforts to understand the role of the non-coding genome and NCREs, annotating functional NCREs remains challenging. METHODS Here we performed an integrative computational analysis of virtually all currently available epigenome data sets related to human fetal brain. RESULTS Our in-depth analysis unravels 39,709 differentially active enhancers (DAEs) that show dynamic epigenomic rearrangement during early stages of human brain development, indicating likely biological function. Many of these DAEs are linked to clinically relevant genes, and functional validation of selected DAEs in cell models and zebrafish confirms their role in gene regulation. Compared to enhancers without dynamic epigenomic rearrangement, DAEs are subjected to higher sequence constraints in humans, have distinct sequence characteristics and are bound by a distinct transcription factor landscape. DAEs are enriched for GWAS loci for brain-related traits and for genetic variation found in individuals with neurodevelopmental disorders, including autism. CONCLUSION This compendium of high-confidence enhancers will assist in deciphering the mechanism behind developmental genetics of human brain and will be relevant to uncover missing heritability in human genetic brain disorders.
Collapse
Affiliation(s)
- Soheil Yousefi
- Department of Clinical Genetics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Ruizhi Deng
- Department of Clinical Genetics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Kristina Lanko
- Department of Clinical Genetics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Eva Medico Salsench
- Department of Clinical Genetics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Anita Nikoncuk
- Department of Clinical Genetics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Herma C. van der Linde
- Department of Clinical Genetics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Elena Perenthaler
- Department of Clinical Genetics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Tjakko J. van Ham
- Department of Clinical Genetics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Eskeatnaf Mulugeta
- Department of Cell Biology, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Tahsin Stefan Barakat
- Department of Clinical Genetics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| |
Collapse
|
23
|
Schmidt F, Marx A, Baumgarten N, Hebel M, Wegner M, Kaulich M, Leisegang M, Brandes R, Göke J, Vreeken J, Schulz M. Integrative analysis of epigenetics data identifies gene-specific regulatory elements. Nucleic Acids Res 2021; 49:10397-10418. [PMID: 34508352 PMCID: PMC8501997 DOI: 10.1093/nar/gkab798] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Revised: 08/01/2021] [Accepted: 09/07/2021] [Indexed: 12/19/2022] Open
Abstract
Understanding how epigenetic variation in non-coding regions is involved in distal gene-expression regulation is an important problem. Regulatory regions can be associated to genes using large-scale datasets of epigenetic and expression data. However, for regions of complex epigenomic signals and enhancers that regulate many genes, it is difficult to understand these associations. We present StitchIt, an approach to dissect epigenetic variation in a gene-specific manner for the detection of regulatory elements (REMs) without relying on peak calls in individual samples. StitchIt segments epigenetic signal tracks over many samples to generate the location and the target genes of a REM simultaneously. We show that this approach leads to a more accurate and refined REM detection compared to standard methods even on heterogeneous datasets, which are challenging to model. Also, StitchIt REMs are highly enriched in experimentally determined chromatin interactions and expression quantitative trait loci. We validated several newly predicted REMs using CRISPR-Cas9 experiments, thereby demonstrating the reliability of StitchIt. StitchIt is able to dissect regulation in superenhancers and predicts thousands of putative REMs that go unnoticed using peak-based approaches suggesting that a large part of the regulome might be uncharted water.
Collapse
Affiliation(s)
- Florian Schmidt
- Cluster of Excellence for Multimodal Computing and Interaction, Saarland University, Saarland Informatics Campus, 66123 Saarbrücken, Germany
- Max Planck Institute for Informatics, Saarland Informatics Campus, 66123 Saarbrücken, Germany
- Graduate School of Computer Science, Saarland Informatics Campus, 66123 Saarbrücken, Germany
- Laboratory of Systems Biology and Data Analytics, Genome Institute of Singapore, 60 Biopolis Street, 138672 Singapore, Singapore
| | - Alexander Marx
- Cluster of Excellence for Multimodal Computing and Interaction, Saarland University, Saarland Informatics Campus, 66123 Saarbrücken, Germany
- Max Planck Institute for Informatics, Saarland Informatics Campus, 66123 Saarbrücken, Germany
- Graduate School of Computer Science, Saarland Informatics Campus, 66123 Saarbrücken, Germany
- International Max Planck Research School for Computer Science, Saarland Informatics Campus, 66123 Saarbrücken, Germany
| | - Nina Baumgarten
- Institute for Cardiovascular Regeneration, Goethe University, 60590 Frankfurt am Main, Germany
- German Center for Cardiovascular Research (DZHK), Partner site RheinMain, 60590 Frankfurt am Main, Germany
| | - Marie Hebel
- Institute of Biochemistry II, Goethe University Frankfurt - Medical Faculty, University Hospital, 60590 Frankfurt am Main, Germany
| | - Martin Wegner
- Institute of Biochemistry II, Goethe University Frankfurt - Medical Faculty, University Hospital, 60590 Frankfurt am Main, Germany
| | - Manuel Kaulich
- Institute of Biochemistry II, Goethe University Frankfurt - Medical Faculty, University Hospital, 60590 Frankfurt am Main, Germany
- Frankfurt Cancer Institute, Goethe University, 60590 Frankfurt am Main, Germany
| | - Matthias S Leisegang
- German Center for Cardiovascular Research (DZHK), Partner site RheinMain, 60590 Frankfurt am Main, Germany
- Institute for Cardiovascular Physiology, Goethe University, 60590 Frankfurt am Main, Germany
| | - Ralf P Brandes
- German Center for Cardiovascular Research (DZHK), Partner site RheinMain, 60590 Frankfurt am Main, Germany
- Institute for Cardiovascular Physiology, Goethe University, 60590 Frankfurt am Main, Germany
| | - Jonathan Göke
- Laboratory of Computational Transcriptomics, Genome Institute of Singapore, 60 Biopolis Street, 138672 Singapore, Singapore
| | - Jilles Vreeken
- CISPA Helmholtz Center for Information Security, Saarland Informatics Campus, 66123 Saarbrücken, Germany
- Cluster of Excellence for Multimodal Computing and Interaction, Saarland University, Saarland Informatics Campus, 66123 Saarbrücken, Germany
- Max Planck Institute for Informatics, Saarland Informatics Campus, 66123 Saarbrücken, Germany
| | - Marcel H Schulz
- Cluster of Excellence for Multimodal Computing and Interaction, Saarland University, Saarland Informatics Campus, 66123 Saarbrücken, Germany
- Max Planck Institute for Informatics, Saarland Informatics Campus, 66123 Saarbrücken, Germany
- Institute for Cardiovascular Regeneration, Goethe University, 60590 Frankfurt am Main, Germany
- German Center for Cardiovascular Research (DZHK), Partner site RheinMain, 60590 Frankfurt am Main, Germany
| |
Collapse
|
24
|
Wang H, Huang B, Wang J. Predict long-range enhancer regulation based on protein-protein interactions between transcription factors. Nucleic Acids Res 2021; 49:10347-10368. [PMID: 34570239 PMCID: PMC8501976 DOI: 10.1093/nar/gkab841] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 08/10/2021] [Accepted: 09/10/2021] [Indexed: 12/18/2022] Open
Abstract
Long-range regulation by distal enhancers plays critical roles in cell-type specific transcriptional programs. Computational predictions of genome-wide enhancer-promoter interactions are still challenging due to limited accuracy and the lack of knowledge on the molecular mechanisms. Based on recent biological investigations, the protein-protein interactions (PPIs) between transcription factors (TFs) have been found to participate in the regulation of chromatin loops. Therefore, we developed a novel predictive model for cell-type specific enhancer-promoter interactions by leveraging the information of TF PPI signatures. Evaluated by a series of rigorous performance comparisons, the new model achieves superior performance over other methods. The model also identifies specific TF PPIs that may mediate long-range regulatory interactions, revealing new mechanistic understandings of enhancer regulation. The prioritized TF PPIs are associated with genes in distinct biological pathways, and the predicted enhancer-promoter interactions are strongly enriched with cis-eQTLs. Most interestingly, the model discovers enhancer-mediated trans-regulatory links between TFs and genes, which are significantly enriched with trans-eQTLs. The new predictive model, along with the genome-wide analyses, provides a platform to systematically delineate the complex interplay among TFs, enhancers and genes in long-range regulation. The novel predictions also lead to mechanistic interpretations of eQTLs to decode the genetic associations with gene expression.
Collapse
Affiliation(s)
- Hao Wang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, 428 S. Shaw Ln., East Lansing, MI 48824, USA
| | - Binbin Huang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, 428 S. Shaw Ln., East Lansing, MI 48824, USA
| | - Jianrong Wang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, 428 S. Shaw Ln., East Lansing, MI 48824, USA
| |
Collapse
|
25
|
Salviato E, Djordjilović V, Hariprakash JM, Tagliaferri I, Pal K, Ferrari F. Leveraging three-dimensional chromatin architecture for effective reconstruction of enhancer-target gene regulatory interactions. Nucleic Acids Res 2021; 49:e97. [PMID: 34197622 PMCID: PMC8464068 DOI: 10.1093/nar/gkab547] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 06/07/2021] [Accepted: 06/17/2021] [Indexed: 12/23/2022] Open
Abstract
A growing amount of evidence in literature suggests that germline sequence variants and somatic mutations in non-coding distal regulatory elements may be crucial for defining disease risk and prognostic stratification of patients, in genetic disorders as well as in cancer. Their functional interpretation is challenging because genome-wide enhancer-target gene (ETG) pairing is an open problem in genomics. The solutions proposed so far do not account for the hierarchy of structural domains which define chromatin three-dimensional (3D) architecture. Here we introduce a change of perspective based on the definition of multi-scale structural chromatin domains, integrated in a statistical framework to define ETG pairs. In this work (i) we develop a computational and statistical framework to reconstruct a comprehensive map of ETG pairs leveraging functional genomics data; (ii) we demonstrate that the incorporation of chromatin 3D architecture information improves ETG pairing accuracy and (iii) we use multiple experimental datasets to extensively benchmark our method against previous solutions for the genome-wide reconstruction of ETG pairs. This solution will facilitate the annotation and interpretation of sequence variants in distal non-coding regulatory elements. We expect this to be especially helpful in clinically oriented applications of whole genome sequencing in cancer and undiagnosed genetic diseases research.
Collapse
Affiliation(s)
- Elisa Salviato
- IFOM, the FIRC Institute of Molecular Oncology, Milan 20139, Italy
| | - Vera Djordjilović
- Department of Economics, Ca’ Foscari University of Venice, Venice 30100, Italy
| | | | | | - Koustav Pal
- IFOM, the FIRC Institute of Molecular Oncology, Milan 20139, Italy
| | - Francesco Ferrari
- IFOM, the FIRC Institute of Molecular Oncology, Milan 20139, Italy
- Institute of Molecular Genetics “Luigi Luca Cavalli-Sforza”, National Research Council, Pavia 27100, Italy
| |
Collapse
|
26
|
González-Ramírez M, Ballaré C, Mugianesi F, Beringer M, Santanach A, Blanco E, Di Croce L. Differential contribution to gene expression prediction of histone modifications at enhancers or promoters. PLoS Comput Biol 2021; 17:e1009368. [PMID: 34473698 PMCID: PMC8443064 DOI: 10.1371/journal.pcbi.1009368] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 09/15/2021] [Accepted: 08/21/2021] [Indexed: 12/31/2022] Open
Abstract
The ChIP-seq signal of histone modifications at promoters is a good predictor of gene expression in different cellular contexts, but whether this is also true at enhancers is not clear. To address this issue, we develop quantitative models to characterize the relationship of gene expression with histone modifications at enhancers or promoters. We use embryonic stem cells (ESCs), which contain a full spectrum of active and repressed (poised) enhancers, to train predictive models. As many poised enhancers in ESCs switch towards an active state during differentiation, predictive models can also be trained on poised enhancers throughout differentiation and in development. Remarkably, we determine that histone modifications at enhancers, as well as promoters, are predictive of gene expression in ESCs and throughout differentiation and development. Importantly, we demonstrate that their contribution to the predictive models varies depending on their location in enhancers or promoters. Moreover, we use a local regression (LOESS) to normalize sequencing data from different sources, which allows us to apply predictive models trained in a specific cellular context to a different one. We conclude that the relationship between gene expression and histone modifications at enhancers is universal and different from promoters. Our study provides new insight into how histone modifications relate to gene expression based on their location in enhancers or promoters. Gene expression can be properly predicted by the ChIP-seq signal of histone modifications at promoters, but whether this is also true at enhancers is unclear. In this study we develop predictive models of gene expression that demonstrate the predictive power of histone modifications at enhancers in the context of mouse embryonic stem cells, during differentiation, and in animal development. Moreover, by assessing the contribution of each histone modification, we found that enhancer predictive models and promoter predictive models have different histone modification requirement. Therefore, different histone modifications relate better to enhancer or promoter function(s). Finally, by applying predictive models trained in a specific cellular context to a different one, we concluded that the relationship between gene expression and histone modifications at enhancers is universal.
Collapse
Affiliation(s)
- Mar González-Ramírez
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Cecilia Ballaré
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Francesca Mugianesi
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Malte Beringer
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Alexandra Santanach
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Enrique Blanco
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Luciano Di Croce
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- ICREA, Pg. Barcelona, Spain
- * E-mail:
| |
Collapse
|
27
|
Tanaka N, Koido M, Suzuki A, Otomo N, Suetsugu H, Kochi Y, Tomizuka K, Momozawa Y, Kamatani Y, Ikegawa S, Yamamoto K, Terao C. Eight novel susceptibility loci and putative causal variants in atopic dermatitis. J Allergy Clin Immunol 2021; 148:1293-1306. [PMID: 34116867 DOI: 10.1016/j.jaci.2021.04.019] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 03/03/2021] [Accepted: 04/08/2021] [Indexed: 01/09/2023]
Abstract
BACKGROUND Atopic dermatitis (AD) is the most common allergic disease in the world. While genetic components play critical roles in its pathophysiology, a large proportion of its genetic background is still unexplored. OBJECTIVES This study sought to illuminate the genetic associations with AD using genome-wide association study (GWAS) and its downstream analyses. METHODS This study conducted a GWAS for AD comprising 2,639 cases and 115,648 controls in the Japanese population, followed by a trans-ethnic meta-analysis with UK Biobank data and downstream analyses including partitioning heritability analysis by linkage disequilibrium score regression. RESULTS This study identified 17 significant susceptibility loci, among which 4 loci-AFF1, ITGB8, EHMT1, and EGR2-were novel in the Japanese GWAS. The trans-ethnic meta-analysis revealed 4 additional novel loci, namely-ZBTB38,LOC105755953/LOC101928272, TRAF3, andIQGAP1. This study found a missense variant (R243W) with a deleterious functional effect in NLRP10 and a variant altering expression of CCDC80 via enhancer expression as highly likely causal variants. These 2 regions were Asian-specific, and these population-specific associations could be explained by the frequency of causal variants. The gene-based test showed SMAD4 as an additional novel significant locus. Downstream analyses revealed substantial overlap of GWAS significant signals in enhancers of skin cells and immune cells, especially CD4 T cells. A highly shared polygenic architecture of AD between Europeans and Asians was also found. CONCLUSIONS This study identified Japanese-specific loci and novel significant loci shared by different populations. Two putative causal variants were illuminated in Japanese-specific loci. Trans-ethnic analyses revealed strong heritability enrichment in immune-related pathways, and relevant cell types shared among populations.
Collapse
Affiliation(s)
- Nao Tanaka
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan; Department of Rheumatology, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Masaru Koido
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan; Division of Molecular Pathology, Department of Cancer Biology, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Akari Suzuki
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Nao Otomo
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan; Laboratory for Bone and Joint Diseases, RIKEN Center for Medical Sciences, Tokyo, Japan; Department of Orthopedic Surgery, Keio University School of Medicine, Tokyo, Japan
| | - Hiroyuki Suetsugu
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan; Laboratory for Bone and Joint Diseases, RIKEN Center for Medical Sciences, Tokyo, Japan; Department of Orthopedic Surgery, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
| | - Yuta Kochi
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan; Department of Genomic Function and Diversity, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan
| | - Kouhei Tomizuka
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Yukihide Momozawa
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Yoichiro Kamatani
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan; Laboratory of Complex Trait Genomics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | -
- Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Shiro Ikegawa
- Laboratory for Bone and Joint Diseases, RIKEN Center for Medical Sciences, Tokyo, Japan
| | - Kazuhiko Yamamoto
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan; Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan; Department of Applied Genetics, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan.
| |
Collapse
|
28
|
Asada K, Kaneko S, Takasawa K, Machino H, Takahashi S, Shinkai N, Shimoyama R, Komatsu M, Hamamoto R. Integrated Analysis of Whole Genome and Epigenome Data Using Machine Learning Technology: Toward the Establishment of Precision Oncology. Front Oncol 2021; 11:666937. [PMID: 34055633 PMCID: PMC8149908 DOI: 10.3389/fonc.2021.666937] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 04/26/2021] [Indexed: 12/17/2022] Open
Abstract
With the completion of the International Human Genome Project, we have entered what is known as the post-genome era, and efforts to apply genomic information to medicine have become more active. In particular, with the announcement of the Precision Medicine Initiative by U.S. President Barack Obama in his State of the Union address at the beginning of 2015, "precision medicine," which aims to divide patients and potential patients into subgroups with respect to disease susceptibility, has become the focus of worldwide attention. The field of oncology is also actively adopting the precision oncology approach, which is based on molecular profiling, such as genomic information, to select the appropriate treatment. However, the current precision oncology is dominated by a method called targeted-gene panel (TGP), which uses next-generation sequencing (NGS) to analyze a limited number of specific cancer-related genes and suggest optimal treatments, but this method causes the problem that the number of patients who benefit from it is limited. In order to steadily develop precision oncology, it is necessary to integrate and analyze more detailed omics data, such as whole genome data and epigenome data. On the other hand, with the advancement of analysis technologies such as NGS, the amount of data obtained by omics analysis has become enormous, and artificial intelligence (AI) technologies, mainly machine learning (ML) technologies, are being actively used to make more efficient and accurate predictions. In this review, we will focus on whole genome sequencing (WGS) analysis and epigenome analysis, introduce the latest results of omics analysis using ML technologies for the development of precision oncology, and discuss the future prospects.
Collapse
Affiliation(s)
- Ken Asada
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Syuzo Kaneko
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Ken Takasawa
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Hidenori Machino
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Satoshi Takahashi
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Norio Shinkai
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
- Department of NCC Cancer Science, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Ryo Shimoyama
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Masaaki Komatsu
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Ryuji Hamamoto
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
- Department of NCC Cancer Science, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| |
Collapse
|
29
|
Giudicelli F, Roest Crollius H. On the importance of evolutionary constraint for regulatory sequence identification. Brief Funct Genomics 2021:elab015. [PMID: 33754633 DOI: 10.1093/bfgp/elab015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 01/15/2021] [Accepted: 02/19/2021] [Indexed: 11/13/2022] Open
Abstract
Regulation of gene expression relies on the activity of specialized genomic elements, enhancers or silencers, distributed over sometimes large distance from their target gene promoters. A significant part of vertebrate genomes consists in such regulatory elements, but their identification and that of their target genes remains challenging, due to the lack of clear signature at the nucleotide level. For many years the main hallmark used for identifying functional elements has been their sequence conservation between genomes of distant species, indicative of purifying selection. More recently, genome-wide biochemical assays have opened new avenues for detecting regulatory regions, shifting attention away from evolutionary constraints. Here, we review the respective contributions of comparative genomics and biochemical assays for the definition of regulatory elements and their targets and advocate that both sequence conservation and preserved synteny, taken as signature of functional constraint, remain essential tools in this task.
Collapse
|
30
|
Roller M, Stamper E, Villar D, Izuogu O, Martin F, Redmond AM, Ramachanderan R, Harewood L, Odom DT, Flicek P. LINE retrotransposons characterize mammalian tissue-specific and evolutionarily dynamic regulatory regions. Genome Biol 2021; 22:62. [PMID: 33602314 PMCID: PMC7890895 DOI: 10.1186/s13059-021-02260-y] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Accepted: 01/04/2021] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND To investigate the mechanisms driving regulatory evolution across tissues, we experimentally mapped promoters, enhancers, and gene expression in the liver, brain, muscle, and testis from ten diverse mammals. RESULTS The regulatory landscape around genes included both tissue-shared and tissue-specific regulatory regions, where tissue-specific promoters and enhancers evolved most rapidly. Genomic regions switching between promoters and enhancers were more common across species, and less common across tissues within a single species. Long Interspersed Nuclear Elements (LINEs) played recurrent evolutionary roles: LINE L1s were associated with tissue-specific regulatory regions, whereas more ancient LINE L2s were associated with tissue-shared regulatory regions and with those switching between promoter and enhancer signatures across species. CONCLUSIONS Our analyses of the tissue-specificity and evolutionary stability among promoters and enhancers reveal how specific LINE families have helped shape the dynamic mammalian regulome.
Collapse
Affiliation(s)
- Maša Roller
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Ericca Stamper
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
- Cancer Research UK Cambridge Institute, University of Cambridge, Robinson Way, Cambridge, CB2 0RE, UK
- Present address: Harriet L. Wilkes Honors College, Florida Atlantic University, Jupiter, FL, 33458, USA
| | - Diego Villar
- Cancer Research UK Cambridge Institute, University of Cambridge, Robinson Way, Cambridge, CB2 0RE, UK
- Present address: Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, E1 2AT, UK
| | - Osagie Izuogu
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Fergal Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Aisling M Redmond
- Cancer Research UK Cambridge Institute, University of Cambridge, Robinson Way, Cambridge, CB2 0RE, UK
- Present address: MRC Cancer Unit, Hutchison-MRC Research Centre, University of Cambridge, Cambridge, CB2 0XZ, UK
| | - Raghavendra Ramachanderan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Louise Harewood
- Cancer Research UK Cambridge Institute, University of Cambridge, Robinson Way, Cambridge, CB2 0RE, UK
- Present address: Precision Medicine Centre of Excellence, Queen's University Belfast, Belfast, BT9 7AE, UK
| | - Duncan T Odom
- Cancer Research UK Cambridge Institute, University of Cambridge, Robinson Way, Cambridge, CB2 0RE, UK.
- German Cancer Research Center (DKFZ), Division of Regulatory Genomics and Cancer Evolution, Im Neuenheimer Feld 280, 69120, Heidelberg, Germany.
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
- Cancer Research UK Cambridge Institute, University of Cambridge, Robinson Way, Cambridge, CB2 0RE, UK.
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| |
Collapse
|
31
|
Tao H, Li H, Xu K, Hong H, Jiang S, Du G, Wang J, Sun Y, Huang X, Ding Y, Li F, Zheng X, Chen H, Bo X. Computational methods for the prediction of chromatin interaction and organization using sequence and epigenomic profiles. Brief Bioinform 2021; 22:6102668. [PMID: 33454752 PMCID: PMC8424394 DOI: 10.1093/bib/bbaa405] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 11/26/2020] [Accepted: 12/10/2020] [Indexed: 12/14/2022] Open
Abstract
The exploration of three-dimensional chromatin interaction and organization provides insight into mechanisms underlying gene regulation, cell differentiation and disease development. Advances in chromosome conformation capture technologies, such as high-throughput chromosome conformation capture (Hi-C) and chromatin interaction analysis by paired-end tag (ChIA-PET), have enabled the exploration of chromatin interaction and organization. However, high-resolution Hi-C and ChIA-PET data are only available for a limited number of cell lines, and their acquisition is costly, time consuming, laborious and affected by theoretical limitations. Increasing evidence shows that DNA sequence and epigenomic features are informative predictors of regulatory interaction and chromatin architecture. Based on these features, numerous computational methods have been developed for the prediction of chromatin interaction and organization, whereas they are not extensively applied in biomedical study. A systematical study to summarize and evaluate such methods is still needed to facilitate their application. Here, we summarize 48 computational methods for the prediction of chromatin interaction and organization using sequence and epigenomic profiles, categorize them and compare their performance. Besides, we provide a comprehensive guideline for the selection of suitable methods to predict chromatin interaction and organization based on available data and biological question of interest.
Collapse
Affiliation(s)
- Huan Tao
- Beijing Institute of Radiation Medicine
| | - Hao Li
- Beijing Institute of Radiation Medicine
| | - Kang Xu
- Beijing Institute of Radiation Medicine
| | - Hao Hong
- Beijing Institute of Radiation Medicine, Department of Biotechnology
| | - Shuai Jiang
- Beijing Institute of Radiation Medicine, Department of Biotechnology
| | - Guifang Du
- Beijing Institute of Radiation Medicine, Department of Biotechnology
| | | | - Yu Sun
- Beijing Institute of Radiation Medicine, Department of Biotechnology
| | - Xin Huang
- Beijing Institute of Radiation Medicine, Department of Biotechnology
| | - Yang Ding
- Beijing Institute of Radiation Medicine
| | - Fei Li
- Chinese Academy of Sciences, Department of Computer Network Information Center
| | | | | | | |
Collapse
|
32
|
Liu L, Zhang LR, Dao FY, Yang YC, Lin H. A computational framework for identifying the transcription factors involved in enhancer-promoter loop formation. MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 23:347-354. [PMID: 33425492 PMCID: PMC7779541 DOI: 10.1016/j.omtn.2020.11.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Accepted: 11/11/2020] [Indexed: 12/30/2022]
Abstract
The pairwise interaction between transcription factors (TFs) plays an important role in enhancer-promoter loop formation. Although thousands of TFs in the human genome have been found, only a few TF pairs have been demonstrated to be related to loop formation. It is still a challenge to determine which TF pairs could be involved in the enhancer-promoter regulation network. This work describes a computational framework to identify TF pairs in enhancer-promoter regulation. By integrating different levels of data derived from Promoter Capture Hi-C, chromatin immunoprecipitation sequencing (ChIP-seq) of histone marks, RNA-seq, protein-protein interaction (PPI), and TF motif, we identified 361 significant TF pairs and constructed a TF interaction network. From the network, we found several hub-TFs, which may have important roles in the regulation of long-range interactions. Our studies extended TF pairs identified in other experimental and computational approaches. These findings will help the further study of long-range interactions between enhancers and promoters.
Collapse
Affiliation(s)
- Li Liu
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Li-Rong Zhang
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Fu-Ying Dao
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yan-Chao Yang
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Hao Lin
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
33
|
Baur B, Shin J, Zhang S, Roy S. Data integration for inferring context-specific gene regulatory networks. CURRENT OPINION IN SYSTEMS BIOLOGY 2020; 23:38-46. [PMID: 33225112 PMCID: PMC7676633 DOI: 10.1016/j.coisb.2020.09.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Transcriptional regulatory networks control context-specific gene expression patterns and play important roles in normal and disease processes. Advances in genomics are rapidly increasing our ability to measure different components of the regulation machinery at the single-cell and bulk population level. An important challenge is to combine different types of regulatory genomic measurements to construct a more complete picture of gene regulatory networks across different disease, environmental, and developmental contexts. In this review, we focus on recent computational methods that integrate regulatory genomic data sets to infer context specificity and dynamics in regulatory networks.
Collapse
Affiliation(s)
- Brittany Baur
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, 53715, USA
| | - Junha Shin
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, 53715, USA
| | - Shilu Zhang
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, 53715, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, 53715, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53715, USA
| |
Collapse
|
34
|
Sammons MA, Nguyen TAT, McDade SS, Fischer M. Tumor suppressor p53: from engaging DNA to target gene regulation. Nucleic Acids Res 2020; 48:8848-8869. [PMID: 32797160 PMCID: PMC7498329 DOI: 10.1093/nar/gkaa666] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Revised: 07/24/2020] [Accepted: 07/30/2020] [Indexed: 12/13/2022] Open
Abstract
The p53 transcription factor confers its potent tumor suppressor functions primarily through the regulation of a large network of target genes. The recent explosion of next generation sequencing protocols has enabled the study of the p53 gene regulatory network (GRN) and underlying mechanisms at an unprecedented depth and scale, helping us to understand precisely how p53 controls gene regulation. Here, we discuss our current understanding of where and how p53 binds to DNA and chromatin, its pioneer-like role, and how this affects gene regulation. We provide an overview of the p53 GRN and the direct and indirect mechanisms through which p53 affects gene regulation. In particular, we focus on delineating the ubiquitous and cell type-specific network of regulatory elements that p53 engages; reviewing our understanding of how, where, and when p53 binds to DNA and the mechanisms through which these events regulate transcription. Finally, we discuss the evolution of the p53 GRN and how recent work has revealed remarkable differences between vertebrates, which are of particular importance to cancer researchers using mouse models.
Collapse
Affiliation(s)
- Morgan A Sammons
- Department of Biological Sciences and The RNA Institute, University at Albany, State University of New York, 1400 Washington Avenue, Albany, NY 12222, USA
| | - Thuy-Ai T Nguyen
- Genome Integrity & Structural Biology Laboratory and Immunity, Inflammation and Disease Laboratory, National Institute of Environmental Health Sciences/National Institutes of Health, 111 TW Alexander Drive, Research Triangle Park, NC 27709, USA
| | - Simon S McDade
- Patrick G Johnston Centre for Cancer Research, Queen's University Belfast, 97 Lisburn Road, Belfast BT9 7AE, UK
| | - Martin Fischer
- Computational Biology Group, Leibniz Institute on Aging – Fritz Lipmann Institute (FLI), Beutenbergstraße 11, 07745 Jena, Germany
| |
Collapse
|
35
|
Macedo A, Gontijo AM. The intersectional genetics landscape for humans. Gigascience 2020; 9:giaa083. [PMID: 32761099 PMCID: PMC7407247 DOI: 10.1093/gigascience/giaa083] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Revised: 04/05/2020] [Accepted: 07/08/2020] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND The human body is made up of hundreds-perhaps thousands-of cell types and states, most of which are currently inaccessible genetically. Intersectional genetic approaches can increase the number of genetically accessible cells, but the scope and safety of these approaches have not been systematically assessed. A typical intersectional method acts like an "AND" logic gate by converting the input of 2 or more active, yet unspecific, regulatory elements (REs) into a single cell type specific synthetic output. RESULTS Here, we systematically assessed the intersectional genetics landscape of the human genome using a subset of cells from a large RE usage atlas (Functional ANnoTation Of the Mammalian genome 5 consortium, FANTOM5) obtained by cap analysis of gene expression sequencing (CAGE-seq). We developed the heuristics and algorithms to retrieve and quality-rank "AND" gate intersections. Of the 154 primary cell types surveyed, >90% can be distinguished from each other with as few as 3 to 4 active REs, with quantifiable safety and robustness. We call these minimal intersections of active REs with cell-type diagnostic potential "versatile entry codes" (VEnCodes). Each of the 158 cancer cell types surveyed could also be distinguished from the healthy primary cell types with small VEnCodes, most of which were robust to intra- and interindividual variation. Methods for the cross-validation of CAGE-seq-derived VEnCodes and for the extraction of VEnCodes from pooled single-cell sequencing data are also presented. CONCLUSIONS Our work provides a systematic view of the intersectional genetics landscape in humans and demonstrates the potential of these approaches for future gene delivery technologies.
Collapse
Affiliation(s)
- Andre Macedo
- Chronic Diseases Research Center, NOVA Medical School, Faculdade de Ciências Médicas, Universidade Nova de Lisboa, Rua do Instituto Bacteriológico 5, 1150–190, Lisbon, Portugal
| | - Alisson M Gontijo
- Chronic Diseases Research Center, NOVA Medical School, Faculdade de Ciências Médicas, Universidade Nova de Lisboa, Rua do Instituto Bacteriológico 5, 1150–190, Lisbon, Portugal
| |
Collapse
|
36
|
Clément Y, Torbey P, Gilardi-Hebenstreit P, Crollius HR. Enhancer-gene maps in the human and zebrafish genomes using evolutionary linkage conservation. Nucleic Acids Res 2020; 48:2357-2371. [PMID: 31943068 PMCID: PMC7049698 DOI: 10.1093/nar/gkz1199] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Revised: 12/11/2019] [Accepted: 12/17/2019] [Indexed: 12/14/2022] Open
Abstract
The spatiotemporal expression of genes is controlled by enhancer sequences that bind transcription factors. Identifying the target genes of enhancers remains difficult because enhancers regulate gene expression over long genomic distances. To address this, we used an evolutionary approach to build two genome-wide maps of predicted enhancer-gene associations in the human and zebrafish genomes. Evolutionary conserved sequences were linked to their predicted target genes using PEGASUS, a bioinformatics method that relies on evolutionary conservation of synteny. The analysis of these maps revealed that the number of predicted enhancers linked to a gene correlate with its expression breadth. Comparison of both maps identified hundreds of putative vertebrate ancestral regulatory relationships from which we could determine that predicted enhancer-gene distances scale with genome size despite strong positional conservation. The two maps represent a resource for further studies, including the prioritization of sequence variants in whole genome sequence of patients affected by genetic diseases.
Collapse
Affiliation(s)
- Yves Clément
- École Normale Supérieure, PSL Research University, CNRS, Inserm, Institut de Biologie de l'École Normale Supérieure (IBENS), F-75005 Paris, France
- To whom correspondence should be addressed. Tel:+33 1 57 27 80 35;
| | - Patrick Torbey
- École Normale Supérieure, PSL Research University, CNRS, Inserm, Institut de Biologie de l'École Normale Supérieure (IBENS), F-75005 Paris, France
| | - Pascale Gilardi-Hebenstreit
- École Normale Supérieure, PSL Research University, CNRS, Inserm, Institut de Biologie de l'École Normale Supérieure (IBENS), F-75005 Paris, France
| | - Hugues Roest Crollius
- École Normale Supérieure, PSL Research University, CNRS, Inserm, Institut de Biologie de l'École Normale Supérieure (IBENS), F-75005 Paris, France
- Correspondence may also be addressed to Hugues Roest Crollius. Tel: +33 1 44 32 23 70;
| |
Collapse
|
37
|
Xu H, Zhang S, Yi X, Plewczynski D, Li MJ. Exploring 3D chromatin contacts in gene regulation: The evolution of approaches for the identification of functional enhancer-promoter interaction. Comput Struct Biotechnol J 2020; 18:558-570. [PMID: 32226593 PMCID: PMC7090358 DOI: 10.1016/j.csbj.2020.02.013] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2019] [Revised: 02/21/2020] [Accepted: 02/22/2020] [Indexed: 12/12/2022] Open
Abstract
Mechanisms underlying gene regulation are key to understand how multicellular organisms with various cell types develop from the same genetic blueprint. Dynamic interactions between enhancers and genes are revealed to play central roles in controlling gene transcription, but the determinants to link functional enhancer-promoter pairs remain elusive. A major challenge is the lack of reliable approach to detect and verify functional enhancer-promoter interactions (EPIs). In this review, we summarized the current methods for detecting EPIs and described how developing techniques facilitate the identification of EPI through assessing the merits and drawbacks of these methods. We also reviewed recent state-of-art EPI prediction methods in terms of their rationale, data usage and characterization. Furthermore, we briefly discussed the evolved strategies for validating functional EPIs.
Collapse
Affiliation(s)
- Hang Xu
- 2011 Collaborative Innovation Center of Tianjin for Medical Epigenetics, Tianjin Key Laboratory of Medical Epigenetics, Tianjin Medical University, Tianjin, China
| | - Shijie Zhang
- 2011 Collaborative Innovation Center of Tianjin for Medical Epigenetics, Tianjin Key Laboratory of Medical Epigenetics, Tianjin Medical University, Tianjin, China
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Xianfu Yi
- School of Biomedical Engineering, Tianjin Medical University, Tianjin, China
| | - Dariusz Plewczynski
- Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland
| | - Mulin Jun Li
- 2011 Collaborative Innovation Center of Tianjin for Medical Epigenetics, Tianjin Key Laboratory of Medical Epigenetics, Tianjin Medical University, Tianjin, China
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| |
Collapse
|
38
|
Schmidt F, Kern F, Schulz MH. Integrative prediction of gene expression with chromatin accessibility and conformation data. Epigenetics Chromatin 2020; 13:4. [PMID: 32029002 PMCID: PMC7003490 DOI: 10.1186/s13072-020-0327-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2019] [Accepted: 01/06/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Enhancers play a fundamental role in orchestrating cell state and development. Although several methods have been developed to identify enhancers, linking them to their target genes is still an open problem. Several theories have been proposed on the functional mechanisms of enhancers, which triggered the development of various methods to infer promoter-enhancer interactions (PEIs). The advancement of high-throughput techniques describing the three-dimensional organization of the chromatin, paved the way to pinpoint long-range PEIs. Here we investigated whether including PEIs in computational models for the prediction of gene expression improves performance and interpretability. RESULTS We have extended our [Formula: see text] framework to include DNA contacts deduced from chromatin conformation capture experiments and compared various methods to determine PEIs using predictive modelling of gene expression from chromatin accessibility data and predicted transcription factor (TF) motif data. We designed a novel machine learning approach that allows the prioritization of TFs binding to distal loop and promoter regions with respect to their importance for gene expression regulation. Our analysis revealed a set of core TFs that are part of enhancer-promoter loops involving YY1 in different cell lines. CONCLUSION We present a novel approach that can be used to prioritize TFs involved in distal and promoter-proximal regulatory events by integrating chromatin accessibility, conformation, and gene expression data. We show that the integration of chromatin conformation data can improve gene expression prediction and aids model interpretability.
Collapse
Affiliation(s)
- Florian Schmidt
- High-throughput Genomics & Systems Biology, Cluster of Excellence on Multimodal Computing and Interaction, Saarland Informatics Campus, 66123 Saarbrücken, Germany
- Computational Biology & Applied Algorithmics, Max-Planck Institute for Informatics, Saarland Informatics Campus, 66123 Saarbrücken, Germany
- Center for Bioinformatics, Saarland Informatics Campus, 66123 Saarbrücken, Germany
- Genome Institute of Singapore, A*STAR, 60 Biopolis Street, Singapore, 138672 Singapore
| | - Fabian Kern
- High-throughput Genomics & Systems Biology, Cluster of Excellence on Multimodal Computing and Interaction, Saarland Informatics Campus, 66123 Saarbrücken, Germany
- Center for Bioinformatics, Saarland Informatics Campus, 66123 Saarbrücken, Germany
- Chair for Clinical Bioinformatics, Saarland Informatics Campus, 66123 Saarbrücken, Germany
| | - Marcel H. Schulz
- High-throughput Genomics & Systems Biology, Cluster of Excellence on Multimodal Computing and Interaction, Saarland Informatics Campus, 66123 Saarbrücken, Germany
- Computational Biology & Applied Algorithmics, Max-Planck Institute for Informatics, Saarland Informatics Campus, 66123 Saarbrücken, Germany
- Center for Bioinformatics, Saarland Informatics Campus, 66123 Saarbrücken, Germany
- Institute of Cardiovascular Regeneration, Goethe-University, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner Site Rhein-Main, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
| |
Collapse
|
39
|
Wang X, Goldstein DB. Enhancer Domains Predict Gene Pathogenicity and Inform Gene Discovery in Complex Disease. Am J Hum Genet 2020; 106:215-233. [PMID: 32032514 DOI: 10.1016/j.ajhg.2020.01.012] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2019] [Accepted: 01/13/2020] [Indexed: 02/07/2023] Open
Abstract
Non-coding transcriptional regulatory elements are critical for controlling the spatiotemporal expression of genes. Here, we demonstrate that the sizes and number of enhancers linked to a gene reflect its disease pathogenicity. Moreover, genes with redundant enhancer domains are depleted of cis-acting genetic variants that disrupt gene expression, and they are buffered against the effects of disruptive non-coding mutations. Our results demonstrate that dosage-sensitive genes have evolved a robustness to the disruptive effects of genetic variation by expanding their regulatory domains. This solves a puzzle about why genes associated with human disease are depleted of cis-eQTLs (cis-expression quantitative trait loci), suggesting that this relationship might complicate gene identification in causal genome-wide association studies (GWASs) using eQTL information, and establishes a framework for identifying non-coding regulatory variation with phenotypic consequences.
Collapse
|
40
|
Whalen S, Pollard KS. Reply to 'Inflated performance measures in enhancer-promoter interaction-prediction methods'. Nat Genet 2020; 51:1198-1200. [PMID: 31332377 DOI: 10.1038/s41588-019-0473-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Sean Whalen
- Gladstone Institutes, San Francisco, CA, USA
| | - Katherine S Pollard
- Gladstone Institutes, San Francisco, CA, USA. .,Department of Epidemiology and Biostatistics, Institute for Human Genetics, Quantitative Biology Institute, and Institute for Computational Health Sciences, University of California, San Francisco, San Francisco, CA, USA. .,Chan-Zuckerberg Biohub, San Francisco, CA, USA.
| |
Collapse
|
41
|
Belokopytova PS, Nuriddinov MA, Mozheiko EA, Fishman D, Fishman V. Quantitative prediction of enhancer-promoter interactions. Genome Res 2019; 30:72-84. [PMID: 31804952 PMCID: PMC6961579 DOI: 10.1101/gr.249367.119] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 11/25/2019] [Indexed: 11/24/2022]
Abstract
Recent experimental and computational efforts have provided large data sets describing three-dimensional organization of mouse and human genomes and showed the interconnection between the expression profile, epigenetic state, and spatial interactions of loci. These interconnections were utilized to infer the spatial organization of chromatin, including enhancer–promoter contacts, from one-dimensional epigenetic marks. Here, we show that the predictive power of some of these algorithms is overestimated due to peculiar properties of the biological data. We propose an alternative approach, which provides high-quality predictions of chromatin interactions using information on gene expression and CTCF-binding alone. Using multiple metrics, we confirmed that our algorithm could efficiently predict the three-dimensional architecture of both normal and rearranged genomes.
Collapse
Affiliation(s)
- Polina S Belokopytova
- Institute of Cytology and Genetics SB RAS 630090, Novosibirsk, Russia.,Novosibirsk State University, Novosibirsk, Russia 630090
| | | | | | - Daniil Fishman
- Novosibirsk State University, Novosibirsk, Russia 630090
| | - Veniamin Fishman
- Institute of Cytology and Genetics SB RAS 630090, Novosibirsk, Russia.,Novosibirsk State University, Novosibirsk, Russia 630090
| |
Collapse
|
42
|
Fang CH, Theera-Ampornpunt N, Roth MA, Grama A, Chaterji S. AIKYATAN: mapping distal regulatory elements using convolutional learning on GPU. BMC Bioinformatics 2019; 20:488. [PMID: 31590652 PMCID: PMC6781298 DOI: 10.1186/s12859-019-3049-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2019] [Accepted: 08/22/2019] [Indexed: 12/02/2022] Open
Abstract
Background The data deluge can leverage sophisticated ML techniques for functionally annotating the regulatory non-coding genome. The challenge lies in selecting the appropriate classifier for the specific functional annotation problem, within the bounds of the hardware constraints and the model’s complexity. In our system Aikyatan, we annotate distal epigenomic regulatory sites, e.g., enhancers. Specifically, we develop a binary classifier that classifies genome sequences as distal regulatory regions or not, given their histone modifications’ combinatorial signatures. This problem is challenging because the regulatory regions are distal to the genes, with diverse signatures across classes (e.g., enhancers and insulators) and even within each class (e.g., different enhancer sub-classes). Results We develop a suite of ML models, under the banner Aikyatan, including SVM models, random forest variants, and deep learning architectures, for distal regulatory element (DRE) detection. We demonstrate, with strong empirical evidence, deep learning approaches have a computational advantage. Plus, convolutional neural networks (CNN) provide the best-in-class accuracy, superior to the vanilla variant. With the human embryonic cell line H1, CNN achieves an accuracy of 97.9% and an order of magnitude lower runtime than the kernel SVM. Running on a GPU, the training time is sped up 21x and 30x (over CPU) for DNN and CNN, respectively. Finally, our CNN model enjoys superior prediction performance vis-‘a-vis the competition. Specifically, Aikyatan-CNN achieved 40% higher validation rate versus CSIANN and the same accuracy as RFECS. Conclusions Our exhaustive experiments using an array of ML tools validate the need for a model that is not only expressive but can scale with increasing data volumes and diversity. In addition, a subset of these datasets have image-like properties and benefit from spatial pooling of features. Our Aikyatan suite leverages diverse epigenomic datasets that can then be modeled using CNNs with optimized activation and pooling functions. The goal is to capture the salient features of the integrated epigenomic datasets for deciphering the distal (non-coding) regulatory elements, which have been found to be associated with functional variants. Our source code will be made publicly available at: https://bitbucket.org/cellsandmachines/aikyatan. Electronic supplementary material The online version of this article (10.1186/s12859-019-3049-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Chih-Hao Fang
- Department of Ag. and Biological Engineering, Purdue University, West Lafayette, IN, USA
| | | | | | - Ananth Grama
- Department of Ag. and Biological Engineering, Purdue University, West Lafayette, IN, USA
| | - Somali Chaterji
- Department of Ag. and Biological Engineering, Purdue University, Purdue University, IN, USA.
| |
Collapse
|
43
|
Guffanti G, Bartlett A, Klengel T, Klengel C, Hunter R, Glinsky G, Macciardi F. Novel Bioinformatics Approach Identifies Transcriptional Profiles of Lineage-Specific Transposable Elements at Distinct Loci in the Human Dorsolateral Prefrontal Cortex. Mol Biol Evol 2019; 35:2435-2453. [PMID: 30053206 PMCID: PMC6188555 DOI: 10.1093/molbev/msy143] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Expression of transposable elements (TE) is transiently activated during human preimplantation embryogenesis in a developmental stage- and cell type-specific manner and TE-mediated epigenetic regulation is intrinsically wired in developmental genetic networks in human embryos and embryonic stem cells. However, there are no systematic studies devoted to a comprehensive analysis of the TE transcriptome in human adult organs and tissues, including human neural tissues. To investigate TE expression in the human Dorsolateral Prefrontal Cortex (DLPFC), we developed and validated a straightforward analytical approach to chart quantitative genome-wide expression profiles of all annotated TE loci based on unambiguous mapping of discrete TE-encoded transcripts using a de novo assembly strategy. To initially evaluate the potential regulatory impact of DLPFC-expressed TE, we adopted a comparative evolutionary genomics approach across humans, primates, and rodents to document conservation patterns, lineage-specificity, and colocalizations with transcription factor binding sites mapped within primate- and human-specific TE. We identified 654,665 transcripts expressed from 477,507 distinct loci of different TE classes and families, the majority of which appear to have originated from primate-specific sequences. We discovered 4,687 human-specific and transcriptionally active TEs in DLPFC, of which the prominent majority (80.2%) appears spliced. Our analyses revealed significant associations of DLPFC-expressed TE with primate- and human-specific transcription factor binding sites, suggesting potential cross-talks of concordant regulatory functions. We identified 1,689 TEs differentially expressed in the DLPFC of Schizophrenia patients, a majority of which is located within introns of 1,137 protein-coding genes. Our findings imply that identified DLPFC-expressed TEs may affect human brain structures and functions following different evolutionary trajectories. On one side, hundreds of thousands of TEs maintained a remarkably high conservation for ∼8 My of primates’ evolution, suggesting that they are likely conveying evolutionary-constrained primate-specific regulatory functions. In parallel, thousands of transcriptionally active human-specific TE loci emerged more recently, suggesting that they could be relevant for human-specific behavioral or cognitive functions.
Collapse
Affiliation(s)
- Guia Guffanti
- Department of Psychiatry, Harvard Medical School, Cambridge, MA.,Division of Depression and Anxiety, McLean Hospital, Belmont, MA
| | - Andrew Bartlett
- Department of Psychology, University of Massachusetts, Boston, MA
| | - Torsten Klengel
- Department of Psychiatry, Harvard Medical School, Cambridge, MA.,Division of Depression and Anxiety, McLean Hospital, Belmont, MA.,Department of Psychiatry and Psychotherapy, University Medical Center Göttingen, Georg-August-University, Goettingen, Germany
| | - Claudia Klengel
- Department of Psychiatry, Harvard Medical School, Cambridge, MA.,Division of Depression and Anxiety, McLean Hospital, Belmont, MA
| | - Richard Hunter
- Department of Psychology, University of Massachusetts, Boston, MA
| | - Gennadi Glinsky
- Translational & Functional Genomics, Institute of Engineering in Medicine, University of California San Diego, La Jolla, CA
| | - Fabio Macciardi
- Department of Psychiatry and Human Behavior, University of California Irvine, Irvine, CA
| |
Collapse
|
44
|
Hariprakash JM, Ferrari F. Computational Biology Solutions to Identify Enhancers-target Gene Pairs. Comput Struct Biotechnol J 2019; 17:821-831. [PMID: 31316726 PMCID: PMC6611831 DOI: 10.1016/j.csbj.2019.06.012] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Revised: 06/04/2019] [Accepted: 06/11/2019] [Indexed: 12/12/2022] Open
Abstract
Enhancers are non-coding regulatory elements that are distant from their target gene. Their characterization still remains elusive especially due to challenges in achieving a comprehensive pairing of enhancers and target genes. A number of computational biology solutions have been proposed to address this problem leveraging the increasing availability of functional genomics data and the improved mechanistic understanding of enhancer action. In this review we focus on computational methods for genome-wide definition of enhancer-target gene pairs. We outline the different classes of methods, as well as their main advantages and limitations. The types of information integrated by each method, along with details on their applicability are presented and discussed. We especially highlight the technical challenges that are still unresolved and hamper the effective achievement of a satisfactory and comprehensive solution. We expect this field will keep evolving in the coming years due to the ever-growing availability of data and increasing insights into enhancers crucial role in regulating genome functionality.
Collapse
Affiliation(s)
| | - Francesco Ferrari
- IFOM, The FIRC Institute of Molecular Oncology, Milan, Italy
- Institute of Molecular Genetics, National Research Council, Pavia, Italy
| |
Collapse
|
45
|
Perreault AA, Sprunger DM, Venters BJ. Epigenetic and transcriptional profiling of triple negative breast cancer. Sci Data 2019; 6:190033. [PMID: 30835260 PMCID: PMC6400101 DOI: 10.1038/sdata.2019.33] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Accepted: 01/22/2019] [Indexed: 12/16/2022] Open
Abstract
The human HCC1806 cell line is frequently used as a preclinical model for triple negative breast cancer (TNBC). Given that dysregulated epigenetic mechanisms are involved in cancer pathogenesis, emerging therapeutic strategies target chromatin regulators, such as histone deacetylases. A comprehensive understanding of the epigenome and transcription profiling in HCC1806 provides the framework for evaluating efficacy and molecular mechanisms of epigenetic therapies. Thus, to study the interplay of transcription and chromatin in the HCC1806 preclinical model, we performed nascent transcription profiling using Precision Run-On coupled to sequencing (PRO-seq). Additionally, we mapped the genome-wide locations for RNA polymerase II (Pol II), the histone variant H2A.Z, seven histone modifications, and CTCF using ChIP-exo. ChIP-exonuclease (ChIP-exo) is a refined version of ChIP-seq with near base pair precision mapping of protein-DNA interactions. In this Data Descriptor, we present detailed information on experimental design, data generation, quality control analysis, and data validation. We discuss how these data lay the foundation for future analysis to understand the relationship between the nascent transcription and chromatin.
Collapse
Affiliation(s)
- Andrea A. Perreault
- Chemical and Physical Biology Program at Vanderbilt University, Nashville, TN, USA
| | - Danielle M. Sprunger
- Department of Molecular Physiology and Biophysics, Vanderbilt Genetics Institute, Vanderbilt Ingram Cancer Center, Vanderbilt University, Nashville, TN, USA
| | - Bryan J. Venters
- Department of Molecular Physiology and Biophysics, Vanderbilt Genetics Institute, Vanderbilt Ingram Cancer Center, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|