1
|
Darieva Z, Zarrineh P, Phillips N, Mallen J, Garcia Mora A, Donaldson I, Bridoux L, Douglas M, Dias Henriques SF, Schulte D, Birket MJ, Bobola N. Ubiquitous MEIS transcription factors actuate lineage-specific transcription to establish cell fate. EMBO J 2025; 44:2232-2262. [PMID: 40021842 PMCID: PMC12000411 DOI: 10.1038/s44318-025-00385-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Revised: 01/30/2025] [Accepted: 01/31/2025] [Indexed: 03/03/2025] Open
Abstract
Control of gene expression is commonly mediated by distinct combinations of transcription factors (TFs). This cooperative action allows the integration of multiple biological signals at regulatory elements, resulting in highly specific gene expression patterns. It is unclear whether combinatorial binding is also necessary to bring together TFs with distinct biochemical functions, which collaborate to effectively recruit and activate RNA polymerase II. Using a cardiac differentiation model, we find that the largely ubiquitous homeodomain proteins MEIS act as actuators, fully activating transcriptional programs selected by lineage-restricted TFs. Combinatorial binding of MEIS with lineage-enriched TFs, GATA, and HOX, provides selectivity, guiding MEIS to function at cardiac-specific enhancers. In turn, MEIS TFs promote the accumulation of the methyltransferase KMT2D to initiate lineage-specific enhancer commissioning. MEIS combinatorial binding dynamics, dictated by the changing dosage of its partners, drive cells into progressive stages of differentiation. Our results uncover tissue-specific transcriptional activation as the result of ubiquitous actuator TFs harnessing general transcriptional activator at tissue-specific enhancers, to which they are directed by binding with lineage- and domain-specific TFs.
Collapse
Affiliation(s)
- Zoulfia Darieva
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Peyman Zarrineh
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Naomi Phillips
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Joshua Mallen
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Araceli Garcia Mora
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Ian Donaldson
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Laure Bridoux
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Megan Douglas
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | | | - Dorothea Schulte
- Goethe University, University Hospital Frankfurt, Neurological Institute (Edinger Institute), Frankfurt am Main, Germany
| | - Matthew J Birket
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK.
| | - Nicoletta Bobola
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK.
| |
Collapse
|
2
|
Xu Y, Jiang X, Hu Z. Synergizing metabolomics and artificial intelligence for advancing precision oncology. Trends Mol Med 2025:S1471-4914(25)00016-4. [PMID: 39956738 DOI: 10.1016/j.molmed.2025.01.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2024] [Revised: 01/22/2025] [Accepted: 01/24/2025] [Indexed: 02/18/2025]
Abstract
Metabolomics has emerged as a transformative tool in precision oncology, with substantial potential for advancing biomarker discovery, monitoring treatment responses, and aiding drug development. Integrating artificial intelligence (AI) into metabolomics optimizes data acquisition and analysis, facilitating the interpretation of complex metabolic networks and enabling more effective multiomics integration. In this opinion, we explore recent advances in the application of metabolomics within precision oncology, emphasizing the unique advantages that AI-driven metabolomics offers. We propose that AI not only complements but also amplifies the potential of current platforms, accelerating research progress and ultimately improving patient outcomes. Finally, we discuss the opportunities and challenges involved in translating AI-driven metabolomics into clinical practice for precision oncology.
Collapse
Affiliation(s)
- Yipeng Xu
- School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, China; Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Xiaojuan Jiang
- School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, China
| | - Zeping Hu
- School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, China; Tsinghua-Peking Joint Center for Life Sciences, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
3
|
Awdeh A, Turcotte M, Perkins TJ. Identifying transcription factors with cell-type specific DNA binding signatures. BMC Genomics 2024; 25:957. [PMID: 39402535 PMCID: PMC11472444 DOI: 10.1186/s12864-024-10859-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Accepted: 10/02/2024] [Indexed: 10/19/2024] Open
Abstract
BACKGROUND Transcription factors (TFs) bind to different parts of the genome in different types of cells, but it is usually assumed that the inherent DNA-binding preferences of a TF are invariant to cell type. Yet, there are several known examples of TFs that switch their DNA-binding preferences in different cell types, and yet more examples of other mechanisms, such as steric hindrance or cooperative binding, that may result in a "DNA signature" of differential binding. RESULTS To survey this phenomenon systematically, we developed a deep learning method we call SigTFB (Signatures of TF Binding) to detect and quantify cell-type specificity in a TF's known genomic binding sites. We used ENCODE ChIP-seq data to conduct a wide scale investigation of 169 distinct TFs in up to 14 distinct cell types. SigTFB detected statistically significant DNA binding signatures in approximately two-thirds of TFs, far more than might have been expected from the relatively sparse evidence in prior literature. We found that the presence or absence of a cell-type specific DNA binding signature is distinct from, and indeed largely uncorrelated to, the degree of overlap between ChIP-seq peaks in different cell types, and tended to arise by two mechanisms: using established motifs in different frequencies, and by selective inclusion of motifs for distint TFs. CONCLUSIONS While recent results have highlighted cell state features such as chromatin accessibility and gene expression in predicting TF binding, our results emphasize that, for some TFs, the DNA sequences of the binding sites contain substantial cell-type specific motifs.
Collapse
Affiliation(s)
- Aseel Awdeh
- School of Electrical Engineering and Compute Science, University of Ottawa, 800 King Edward Ave., Ottawa, K1N 6N5, Ontario, Canada
- Regenerative Medicine Program, Ottawa Hospital Research Institute, 501 Smyth Rd., Ottawa, K1H 8L6, Ontario, Canada
| | - Marcel Turcotte
- School of Electrical Engineering and Compute Science, University of Ottawa, 800 King Edward Ave., Ottawa, K1N 6N5, Ontario, Canada
| | - Theodore J Perkins
- School of Electrical Engineering and Compute Science, University of Ottawa, 800 King Edward Ave., Ottawa, K1N 6N5, Ontario, Canada.
- Regenerative Medicine Program, Ottawa Hospital Research Institute, 501 Smyth Rd., Ottawa, K1H 8L6, Ontario, Canada.
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa, 451 Smyth Rd., Ottawa, K1H 8M5, Ontario, Canada.
| |
Collapse
|
4
|
Rauluseviciute I, Riudavets-Puig R, Blanc-Mathieu R, Castro-Mondragon J, Ferenc K, Kumar V, Lemma RB, Lucas J, Chèneby J, Baranasic D, Khan A, Fornes O, Gundersen S, Johansen M, Hovig E, Lenhard B, Sandelin A, Wasserman W, Parcy F, Mathelier A. JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles. Nucleic Acids Res 2024; 52:D174-D182. [PMID: 37962376 PMCID: PMC10767809 DOI: 10.1093/nar/gkad1059] [Citation(s) in RCA: 241] [Impact Index Per Article: 241.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/20/2023] [Accepted: 10/31/2023] [Indexed: 11/15/2023] Open
Abstract
JASPAR (https://jaspar.elixir.no/) is a widely-used open-access database presenting manually curated high-quality and non-redundant DNA-binding profiles for transcription factors (TFs) across taxa. In this 10th release and 20th-anniversary update, the CORE collection has expanded with 329 new profiles. We updated three existing profiles and provided orthogonal support for 72 profiles from the previous release's UNVALIDATED collection. Altogether, the JASPAR 2024 update provides a 20% increase in CORE profiles from the previous release. A trimming algorithm enhanced profiles by removing low information content flanking base pairs, which were likely uninformative (within the capacity of the PFM models) for TFBS predictions and modelling TF-DNA interactions. This release includes enhanced metadata, featuring a refined classification for plant TFs' structural DNA-binding domains. The new JASPAR collections prompt updates to the genomic tracks of predicted TF binding sites (TFBSs) in 8 organisms, with human and mouse tracks available as native tracks in the UCSC Genome browser. All data are available through the JASPAR web interface and programmatically through its API and the updated Bioconductor and pyJASPAR packages. Finally, a new TFBS extraction tool enables users to retrieve predicted JASPAR TFBSs intersecting their genomic regions of interest.
Collapse
Affiliation(s)
- Ieva Rauluseviciute
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Rafael Riudavets-Puig
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Romain Blanc-Mathieu
- Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, IRIG-DBSCI-LPCV, 17 avenue des martyrs, F-38054, Grenoble, France
| | - Jaime A Castro-Mondragon
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Katalin Ferenc
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Vipin Kumar
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Roza Berhanu Lemma
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Jérémy Lucas
- Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, IRIG-DBSCI-LPCV, 17 avenue des martyrs, F-38054, Grenoble, France
| | - Jeanne Chèneby
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| | - Damir Baranasic
- MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK
- Division of Electronics, Ruđer Bošković Institute, Bijenička cesta, 10000 Zagreb, Croatia
| | - Aziz Khan
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Oriol Fornes
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - Sveinung Gundersen
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| | - Morten Johansen
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| | - Eivind Hovig
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, 0424 Oslo, Norway
| | - Boris Lenhard
- MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK
| | - Albin Sandelin
- Department of Biology and Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK2200 Copenhagen N, Denmark
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - François Parcy
- Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, IRIG-DBSCI-LPCV, 17 avenue des martyrs, F-38054, Grenoble, France
| | - Anthony Mathelier
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
- Department of Medical Genetics, Institute of Clinical Medicine, University of Oslo and Oslo University Hospital, Oslo, Norway
| |
Collapse
|
5
|
Bobola N, Sagerström CG. TALE transcription factors: Cofactors no more. Semin Cell Dev Biol 2024; 152-153:76-84. [PMID: 36509674 DOI: 10.1016/j.semcdb.2022.11.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 10/27/2022] [Accepted: 11/30/2022] [Indexed: 12/14/2022]
Abstract
Exd/PBX, Hth/MEIS and PREP proteins belong to the TALE (three-amino-acid loop extension) superclass of transcription factors (TFs) with an atypical homedomain (HD). Originally discovered as "cofactors" to HOX proteins, revisiting their traditional role in light of genome-wide experiments reveals a strong and reproducible pattern of HOX and TALE co-occupancy across diverse embryonic tissues. While confirming that TALE increases HOX specificity and selectivity in vivo, this wider outlook also reveals novel aspects of HOX:TALE collaboration, namely that HOX TFs generally require pre-bound TALE factors to access their functional binding sites in vivo. In contrast to the restricted expression domains of HOX TFs, TALE factors are largely ubiquitous, and PBX and PREP are expressed at the earliest developmental stages. PBX and MEIS control development of many organs and tissues and their dysregulation is associated with congenital disease and cancer. Accordingly, many instances of TALE cooperation with non HOX TFs have been documented in various systems. The model that emerges from these studies is that TALE TFs create a permissive chromatin platform that is selected by tissue-restricted TFs for binding. In turn, HOX and other tissue-restricted TFs selectively convert a ubiquitous pool of low affinity TALE binding events into high confidence, tissue-restricted binding events associated with transcriptional activation. As a result, TALE:TF complexes are associated with active chromatin and domain/lineage-specific gene activity. TALE ubiquitous expression and broad genomic occupancy, as well as the increasing examples of TALE tissue-specific partners, reveal a universal and obligatory role for TALE in the control of tissue and lineage-specific transcriptional programs, beyond their initial discovery as HOX co-factors.
Collapse
Affiliation(s)
- Nicoletta Bobola
- School of Medical Sciences, University of Manchester, Manchester, UK.
| | - Charles G Sagerström
- Section of Developmental Biology, Department of Pediatrics, University of Colorado Medical School, Aurora, CO, USA.
| |
Collapse
|
6
|
Tahara S, Tsuchiya T, Matsumoto H, Ozaki H. Transcription factor-binding k-mer analysis clarifies the cell type dependency of binding specificities and cis-regulatory SNPs in humans. BMC Genomics 2023; 24:597. [PMID: 37805453 PMCID: PMC10560430 DOI: 10.1186/s12864-023-09692-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 09/21/2023] [Indexed: 10/09/2023] Open
Abstract
BACKGROUND Transcription factors (TFs) exhibit heterogeneous DNA-binding specificities in individual cells and whole organisms under natural conditions, and de novo motif discovery usually provides multiple motifs, even from a single chromatin immunoprecipitation-sequencing (ChIP-seq) sample. Despite the accumulation of ChIP-seq data and ChIP-seq-derived motifs, the diversity of DNA-binding specificities across different TFs and cell types remains largely unexplored. RESULTS Here, we applied MOCCS2, our k-mer-based motif discovery method, to a collection of human TF ChIP-seq samples across diverse TFs and cell types, and systematically computed profiles of TF-binding specificity scores for all k-mers. After quality control, we compiled a set of TF-binding specificity score profiles for 2,976 high-quality ChIP-seq samples, comprising 473 TFs and 398 cell types. Using these high-quality samples, we confirmed that the k-mer-based TF-binding specificity profiles reflected TF- or TF-family dependent DNA-binding specificities. We then compared the binding specificity scores of ChIP-seq samples with the same TFs but with different cell type classes and found that half of the analyzed TFs exhibited differences in DNA-binding specificities across cell type classes. Additionally, we devised a method to detect differentially bound k-mers between two ChIP-seq samples and detected k-mers exhibiting statistically significant differences in binding specificity scores. Moreover, we demonstrated that differences in the binding specificity scores between k-mers on the reference and alternative alleles could be used to predict the effect of variants on TF binding, as validated by in vitro and in vivo assay datasets. Finally, we demonstrated that binding specificity score differences can be used to interpret disease-associated non-coding single-nucleotide polymorphisms (SNPs) as TF-affecting SNPs and provide candidates responsible for TFs and cell types. CONCLUSIONS Our study provides a basis for investigating the regulation of gene expression in a TF-, TF family-, or cell-type-dependent manner. Furthermore, our differential analysis of binding-specificity scores highlights noncoding disease-associated variants in humans.
Collapse
Affiliation(s)
- Saeko Tahara
- Bioinformatics Laboratory, Institute of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577, Japan
- School of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577, Japan
| | - Takaho Tsuchiya
- Bioinformatics Laboratory, Institute of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577, Japan
- Center for Artificial Intelligence Research, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577, Japan
| | - Hirotaka Matsumoto
- School of Information and Data Sciences, Nagasaki University, 1-14, Bunkyo-Machi, Nagasaki City, Nagasaki, 852-8521, Japan
- Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics, Wako, Saitama, 351-0198, Japan
| | - Haruka Ozaki
- Bioinformatics Laboratory, Institute of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577, Japan.
- Center for Artificial Intelligence Research, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577, Japan.
- Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics, Wako, Saitama, 351-0198, Japan.
| |
Collapse
|
7
|
Zhang Y, Liu Y, Wang Z, Wang M, Xiong S, Huang G, Gong M. Uncovering the Relationship between Tissue-Specific TF-DNA Binding and Chromatin Features through a Transformer-Based Model. Genes (Basel) 2022; 13:1952. [PMID: 36360189 PMCID: PMC9690320 DOI: 10.3390/genes13111952] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 10/19/2022] [Accepted: 10/23/2022] [Indexed: 09/08/2024] Open
Abstract
Chromatin features can reveal tissue-specific TF-DNA binding, which leads to a better understanding of many critical physiological processes. Accurately identifying TF-DNA bindings and constructing their relationships with chromatin features is a long-standing goal in the bioinformatic field. However, this has remained elusive due to the complex binding mechanisms and heterogeneity among inputs. Here, we have developed the GHTNet (General Hybrid Transformer Network), a transformer-based model to predict TF-DNA binding specificity. The GHTNet decodes the relationship between tissue-specific TF-DNA binding and chromatin features via a specific input scheme of alternative inputs and reveals important gene regions and tissue-specific motifs. Our experiments show that the GHTNet has excellent performance, achieving about a 5% absolute improvement over existing methods. The TF-DNA binding mechanism analysis shows that the importance of TF-DNA binding features varies across tissues. The best predictor is based on the DNA sequence, followed by epigenomics and shape. In addition, cross-species studies address the limited data, thus providing new ideas in this case. Moreover, the GHTNet is applied to interpret the relationship among TFs, chromatin features, and diseases associated with AD46 tissue. This paper demonstrates that the GHTNet is an accurate and robust framework for deciphering tissue-specific TF-DNA binding and interpreting non-coding regions.
Collapse
Affiliation(s)
- Yongqing Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Yuhang Liu
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Zixuan Wang
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Maocheng Wang
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Shuwen Xiong
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Guo Huang
- School of Electronic Information and Artificial Intelligence, Leshan Normal University, Leshan 614000, China
| | - Meiqin Gong
- West China Second University Hospital, Sichuan University, Chengdu 610041, China
| |
Collapse
|
8
|
Tsimenidis S, Vrochidou E, Papakostas GA. Omics Data and Data Representations for Deep Learning-Based Predictive Modeling. Int J Mol Sci 2022; 23:12272. [PMID: 36293133 PMCID: PMC9603455 DOI: 10.3390/ijms232012272] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 10/03/2022] [Accepted: 10/12/2022] [Indexed: 11/25/2022] Open
Abstract
Medical discoveries mainly depend on the capability to process and analyze biological datasets, which inundate the scientific community and are still expanding as the cost of next-generation sequencing technologies is decreasing. Deep learning (DL) is a viable method to exploit this massive data stream since it has advanced quickly with there being successive innovations. However, an obstacle to scientific progress emerges: the difficulty of applying DL to biology, and this because both fields are evolving at a breakneck pace, thus making it hard for an individual to occupy the front lines of both of them. This paper aims to bridge the gap and help computer scientists bring their valuable expertise into the life sciences. This work provides an overview of the most common types of biological data and data representations that are used to train DL models, with additional information on the models themselves and the various tasks that are being tackled. This is the essential information a DL expert with no background in biology needs in order to participate in DL-based research projects in biomedicine, biotechnology, and drug discovery. Alternatively, this study could be also useful to researchers in biology to understand and utilize the power of DL to gain better insights into and extract important information from the omics data.
Collapse
Affiliation(s)
| | | | - George A. Papakostas
- MLV Research Group, Department of Computer Science, International Hellenic University, 65404 Kavala, Greece
| |
Collapse
|
9
|
Jain A, Mittal S, Tripathi LP, Nussinov R, Ahmad S. Host-pathogen protein-nucleic acid interactions: A comprehensive review. Comput Struct Biotechnol J 2022; 20:4415-4436. [PMID: 36051878 PMCID: PMC9420432 DOI: 10.1016/j.csbj.2022.08.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 08/01/2022] [Accepted: 08/01/2022] [Indexed: 12/02/2022] Open
Abstract
Recognition of pathogen-derived nucleic acids by host cells is an effective host strategy to detect pathogenic invasion and trigger immune responses. In the context of pathogen-specific pharmacology, there is a growing interest in mapping the interactions between pathogen-derived nucleic acids and host proteins. Insight into the principles of the structural and immunological mechanisms underlying such interactions and their roles in host defense is necessary to guide therapeutic intervention. Here, we discuss the newest advances in studies of molecular interactions involving pathogen nucleic acids and host factors, including their drug design, molecular structure and specific patterns. We observed that two groups of nucleic acid recognizing molecules, Toll-like receptors (TLRs) and the cytoplasmic retinoic acid-inducible gene (RIG)-I-like receptors (RLRs) form the backbone of host responses to pathogen nucleic acids, with additional support provided by absent in melanoma 2 (AIM2) and DNA-dependent activator of Interferons (IFNs)-regulatory factors (DAI) like cytosolic activity. We review the structural, immunological, and other biological aspects of these representative groups of molecules, especially in terms of their target specificity and affinity and challenges in leveraging host-pathogen protein-nucleic acid interactions (HP-PNI) in drug discovery.
Collapse
Affiliation(s)
- Anuja Jain
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India
| | - Shikha Mittal
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India
- Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology, Waknaghat, Solan, Himachal Pradesh, 173234, India
| | - Lokesh P. Tripathi
- National Institutes of Biomedical Innovation, Health and Nutrition, Ibaraki, Osaka, Japan
- Riken Center for Integrative Medical Sciences, Tsurumi, Yokohama, Kanagawa, Japan
| | - Ruth Nussinov
- Computational Structural Biology Section, Basic Science Program, Frederick National, Laboratory for Cancer Research, Frederick, MD 21702, USA
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Israel
| | - Shandar Ahmad
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India
| |
Collapse
|
10
|
Cui F, Zhang Z, Cao C, Zou Q, Chen D, Su X. Protein-DNA/RNA interactions: Machine intelligence tools and approaches in the era of artificial intelligence and big data. Proteomics 2022; 22:e2100197. [PMID: 35112474 DOI: 10.1002/pmic.202100197] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 01/02/2022] [Accepted: 01/17/2022] [Indexed: 11/09/2022]
Abstract
With the development of artificial intelligence technologies and the availability of large amounts of biological data, computational methods for proteomics have undergone a developmental process from traditional machine learning to deep learning. This review focuses on computational approaches and tools for the prediction of protein-DNA/RNA interactions using machine intelligence techniques. We provide an overview of the development progress of computational methods and summarize the advantages and shortcomings of these methods. We further compiled applications in tasks related to the protein-DNA/RNA interactions, and pointed out possible future application trends. Moreover, biological sequence-digitizing representation strategies used in different types of computational methods are also summarized and discussed. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Feifei Cui
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Zilong Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Chen Cao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University, Quzhou, 324000, China
| | - Xi Su
- Foshan Maternal and Child Health Hospital, Foshan, Guangdong, China
| |
Collapse
|
11
|
Hammelman J, Gifford DK. Discovering differential genome sequence activity with interpretable and efficient deep learning. PLoS Comput Biol 2021; 17:e1009282. [PMID: 34370721 PMCID: PMC8376110 DOI: 10.1371/journal.pcbi.1009282] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Revised: 08/19/2021] [Accepted: 07/16/2021] [Indexed: 11/23/2022] Open
Abstract
Discovering sequence features that differentially direct cells to alternate fates is key to understanding both cellular development and the consequences of disease related mutations. We introduce Expected Pattern Effect and Differential Expected Pattern Effect, two black-box methods that can interpret genome regulatory sequences for cell type-specific or condition specific patterns. We show that these methods identify relevant transcription factor motifs and spacings that are predictive of cell state-specific chromatin accessibility. Finally, we integrate these methods into framework that is readily accessible to non-experts and available for download as a binary or installed via PyPI or bioconda at https://cgs.csail.mit.edu/deepaccess-package/.
Collapse
Affiliation(s)
- Jennifer Hammelman
- Computational and Systems Biology, MIT, Cambridge, Massachusetts, United States of America
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts, United States of America
| | - David K. Gifford
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts, United States of America
- Department of Electrical Engineering & Computer Science, MIT, Cambridge, Massachusetts, United States of America
- Department of Biological Engineering, MIT, Cambridge, Massachusetts, United States of America
| |
Collapse
|
12
|
Bridoux L, Zarrineh P, Mallen J, Phuycharoen M, Latorre V, Ladam F, Losa M, Baker SM, Sagerstrom C, Mace KA, Rattray M, Bobola N. HOX paralogs selectively convert binding of ubiquitous transcription factors into tissue-specific patterns of enhancer activation. PLoS Genet 2020; 16:e1009162. [PMID: 33315856 PMCID: PMC7769617 DOI: 10.1371/journal.pgen.1009162] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 12/28/2020] [Accepted: 09/28/2020] [Indexed: 11/18/2022] Open
Abstract
Gene expression programs determine cell fate in embryonic development and their dysregulation results in disease. Transcription factors (TFs) control gene expression by binding to enhancers, but how TFs select and activate their target enhancers is still unclear. HOX TFs share conserved homeodomains with highly similar sequence recognition properties, yet they impart the identity of different animal body parts. To understand how HOX TFs control their specific transcriptional programs in vivo, we compared HOXA2 and HOXA3 binding profiles in the mouse embryo. HOXA2 and HOXA3 directly cooperate with TALE TFs and selectively target different subsets of a broad TALE chromatin platform. Binding of HOX and tissue-specific TFs convert low affinity TALE binding into high confidence, tissue-specific binding events, which bear the mark of active enhancers. We propose that HOX paralogs, alone and in combination with tissue-specific TFs, generate tissue-specific transcriptional outputs by modulating the activity of TALE TFs at selected enhancers.
Collapse
Affiliation(s)
- Laure Bridoux
- School of Medical Sciences, University of Manchester, Manchester, United Kingdom
| | - Peyman Zarrineh
- School of Health Sciences, University of Manchester, Manchester, United Kingdom
| | - Joshua Mallen
- School of Medical Sciences, University of Manchester, Manchester, United Kingdom
| | - Mike Phuycharoen
- Department of Computer Science, University of Manchester, Manchester, United Kingdom
| | - Victor Latorre
- School of Medical Sciences, University of Manchester, Manchester, United Kingdom
| | - Frank Ladam
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusets, United States of America
| | - Marta Losa
- School of Medical Sciences, University of Manchester, Manchester, United Kingdom
| | - Syed Murtuza Baker
- School of Health Sciences, University of Manchester, Manchester, United Kingdom
- School of Biological Sciences, University of Manchester, Manchester, United Kingdom
| | - Charles Sagerstrom
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusets, United States of America
| | - Kimberly A. Mace
- School of Biological Sciences, University of Manchester, Manchester, United Kingdom
| | - Magnus Rattray
- School of Health Sciences, University of Manchester, Manchester, United Kingdom
| | - Nicoletta Bobola
- School of Medical Sciences, University of Manchester, Manchester, United Kingdom
- * E-mail:
| |
Collapse
|
13
|
Jha A, K Aicher J, R Gazzara M, Singh D, Barash Y. Enhanced Integrated Gradients: improving interpretability of deep learning models using splicing codes as a case study. Genome Biol 2020; 21:149. [PMID: 32560708 PMCID: PMC7305616 DOI: 10.1186/s13059-020-02055-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Accepted: 05/22/2020] [Indexed: 01/03/2023] Open
Abstract
Despite the success and fast adaptation of deep learning models in biomedical domains, their lack of interpretability remains an issue. Here, we introduce Enhanced Integrated Gradients (EIG), a method to identify significant features associated with a specific prediction task. Using RNA splicing prediction as well as digit classification as case studies, we demonstrate that EIG improves upon the original Integrated Gradients method and produces sets of informative features. We then apply EIG to identify A1CF as a key regulator of liver-specific alternative splicing, supporting this finding with subsequent analysis of relevant A1CF functional (RNA-seq) and binding data (PAR-CLIP).
Collapse
Affiliation(s)
- Anupama Jha
- Department of Computer and Information Science, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, USA
| | - Joseph K Aicher
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
| | - Matthew R Gazzara
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
| | - Deependra Singh
- Department of Computer and Information Science, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, USA
| | - Yoseph Barash
- Department of Computer and Information Science, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, USA. .,Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA.
| |
Collapse
|