2
|
Vorontsov IE, Eliseeva IA, Zinkevich A, Nikonov M, Abramov S, Boytsov A, Kamenets V, Kasianova A, Kolmykov S, Yevshin I, Favorov A, Medvedeva YA, Jolma A, Kolpakov F, Makeev V, Kulakovskiy I. HOCOMOCO in 2024: a rebuild of the curated collection of binding models for human and mouse transcription factors. Nucleic Acids Res 2024; 52:D154-D163. [PMID: 37971293 PMCID: PMC10767914 DOI: 10.1093/nar/gkad1077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 10/17/2023] [Accepted: 10/26/2023] [Indexed: 11/19/2023] Open
Abstract
We present a major update of the HOCOMOCO collection that provides DNA binding specificity patterns of 949 human transcription factors and 720 mouse orthologs. To make this release, we performed motif discovery in peak sets that originated from 14 183 ChIP-Seq experiments and reads from 2554 HT-SELEX experiments yielding more than 400 thousand candidate motifs. The candidate motifs were annotated according to their similarity to known motifs and the hierarchy of DNA-binding domains of the respective transcription factors. Next, the motifs underwent human expert curation to stratify distinct motif subtypes and remove non-informative patterns and common artifacts. Finally, the curated subset of 100 thousand motifs was supplied to the automated benchmarking to select the best-performing motifs for each transcription factor. The resulting HOCOMOCO v12 core collection contains 1443 verified position weight matrices, including distinct subtypes of DNA binding motifs for particular transcription factors. In addition to the core collection, HOCOMOCO v12 provides motif sets optimized for the recognition of binding sites in vivo and in vitro, and for annotation of regulatory sequence variants. HOCOMOCO is available at https://hocomoco12.autosome.org and https://hocomoco.autosome.org.
Collapse
Affiliation(s)
- Ilya E Vorontsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Irina A Eliseeva
- Institute of Protein Research, Russian Academy of Sciences, 142290 Pushchino, Russia
| | - Arsenii Zinkevich
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119991 Moscow, Russia
| | - Mikhail Nikonov
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119991 Moscow, Russia
| | - Sergey Abramov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Altius Institute for Biomedical Sciences, 98121 Seattle, WA, USA
| | - Alexandr Boytsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Altius Institute for Biomedical Sciences, 98121 Seattle, WA, USA
| | - Vasily Kamenets
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Moscow Institute of Physics and Technology, 141700 Dolgoprudny, Russia
- Institute of Biochemistry and Genetics of the Ufa Federal Research Centre of the Russian Academy of Sciences, 450054 Ufa, Russia
| | - Alexandra Kasianova
- Skolkovo Institute of Science and Technology, 121205 Moscow, Russia
- Institute for Information Transmission Problems of the Russian Academy of Sciences, 127051 Moscow, Russia
| | - Semyon Kolmykov
- Department of Computational Biology, Sirius University of Science and Technology, 354340 Sirius, Krasnodar region, Russia
| | | | - Alexander Favorov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Yulia A Medvedeva
- Research Center of Biotechnology RAS, Russian Academy of Sciences, 119071 Moscow, Russia
| | - Arttu Jolma
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Fedor Kolpakov
- Department of Computational Biology, Sirius University of Science and Technology, 354340 Sirius, Krasnodar region, Russia
- Bioinformatics Laboratory, Federal Research Center for Information and Computational Technologies, 630090 Novosibirsk, Russia
| | - Vsevolod J Makeev
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Moscow Institute of Physics and Technology, 141700 Dolgoprudny, Russia
- Institute of Biochemistry and Genetics of the Ufa Federal Research Centre of the Russian Academy of Sciences, 450054 Ufa, Russia
| | - Ivan V Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Institute of Protein Research, Russian Academy of Sciences, 142290 Pushchino, Russia
- Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, 420008 Kazan, Russia
| |
Collapse
|
3
|
Rauluseviciute I, Riudavets-Puig R, Blanc-Mathieu R, Castro-Mondragon J, Ferenc K, Kumar V, Lemma RB, Lucas J, Chèneby J, Baranasic D, Khan A, Fornes O, Gundersen S, Johansen M, Hovig E, Lenhard B, Sandelin A, Wasserman W, Parcy F, Mathelier A. JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles. Nucleic Acids Res 2024; 52:D174-D182. [PMID: 37962376 PMCID: PMC10767809 DOI: 10.1093/nar/gkad1059] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/20/2023] [Accepted: 10/31/2023] [Indexed: 11/15/2023] Open
Abstract
JASPAR (https://jaspar.elixir.no/) is a widely-used open-access database presenting manually curated high-quality and non-redundant DNA-binding profiles for transcription factors (TFs) across taxa. In this 10th release and 20th-anniversary update, the CORE collection has expanded with 329 new profiles. We updated three existing profiles and provided orthogonal support for 72 profiles from the previous release's UNVALIDATED collection. Altogether, the JASPAR 2024 update provides a 20% increase in CORE profiles from the previous release. A trimming algorithm enhanced profiles by removing low information content flanking base pairs, which were likely uninformative (within the capacity of the PFM models) for TFBS predictions and modelling TF-DNA interactions. This release includes enhanced metadata, featuring a refined classification for plant TFs' structural DNA-binding domains. The new JASPAR collections prompt updates to the genomic tracks of predicted TF binding sites (TFBSs) in 8 organisms, with human and mouse tracks available as native tracks in the UCSC Genome browser. All data are available through the JASPAR web interface and programmatically through its API and the updated Bioconductor and pyJASPAR packages. Finally, a new TFBS extraction tool enables users to retrieve predicted JASPAR TFBSs intersecting their genomic regions of interest.
Collapse
Affiliation(s)
- Ieva Rauluseviciute
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Rafael Riudavets-Puig
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Romain Blanc-Mathieu
- Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, IRIG-DBSCI-LPCV, 17 avenue des martyrs, F-38054, Grenoble, France
| | - Jaime A Castro-Mondragon
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Katalin Ferenc
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Vipin Kumar
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Roza Berhanu Lemma
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Jérémy Lucas
- Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, IRIG-DBSCI-LPCV, 17 avenue des martyrs, F-38054, Grenoble, France
| | - Jeanne Chèneby
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| | - Damir Baranasic
- MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK
- Division of Electronics, Ruđer Bošković Institute, Bijenička cesta, 10000 Zagreb, Croatia
| | - Aziz Khan
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Oriol Fornes
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - Sveinung Gundersen
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| | - Morten Johansen
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| | - Eivind Hovig
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, 0424 Oslo, Norway
| | - Boris Lenhard
- MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK
| | - Albin Sandelin
- Department of Biology and Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK2200 Copenhagen N, Denmark
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - François Parcy
- Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, IRIG-DBSCI-LPCV, 17 avenue des martyrs, F-38054, Grenoble, France
| | - Anthony Mathelier
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
- Department of Medical Genetics, Institute of Clinical Medicine, University of Oslo and Oslo University Hospital, Oslo, Norway
| |
Collapse
|
5
|
Schleussner N, Cauchy P, Franke V, Giefing M, Fornes O, Vankadari N, Assi SA, Costanza M, Weniger MA, Akalin A, Anagnostopoulos I, Bukur T, Casarotto MG, Damm F, Daumke O, Edginton-White B, Gebhardt JCM, Grau M, Grunwald S, Hansmann ML, Hartmann S, Huber L, Kärgel E, Lusatis S, Noerenberg D, Obier N, Pannicke U, Fischer A, Reisser A, Rosenwald A, Schwarz K, Sundararaj S, Weilemann A, Winkler W, Xu W, Lenz G, Rajewsky K, Wasserman WW, Cockerill PN, Scheidereit C, Siebert R, Küppers R, Grosschedl R, Janz M, Bonifer C, Mathas S. Transcriptional reprogramming by mutated IRF4 in lymphoma. Nat Commun 2023; 14:6947. [PMID: 37935654 PMCID: PMC10630337 DOI: 10.1038/s41467-023-41954-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 09/20/2023] [Indexed: 11/09/2023] Open
Abstract
Disease-causing mutations in genes encoding transcription factors (TFs) can affect TF interactions with their cognate DNA-binding motifs. Whether and how TF mutations impact upon the binding to TF composite elements (CE) and the interaction with other TFs is unclear. Here, we report a distinct mechanism of TF alteration in human lymphomas with perturbed B cell identity, in particular classic Hodgkin lymphoma. It is caused by a recurrent somatic missense mutation c.295 T > C (p.Cys99Arg; p.C99R) targeting the center of the DNA-binding domain of Interferon Regulatory Factor 4 (IRF4), a key TF in immune cells. IRF4-C99R fundamentally alters IRF4 DNA-binding, with loss-of-binding to canonical IRF motifs and neomorphic gain-of-binding to canonical and non-canonical IRF CEs. IRF4-C99R thoroughly modifies IRF4 function by blocking IRF4-dependent plasma cell induction, and up-regulates disease-specific genes in a non-canonical Activator Protein-1 (AP-1)-IRF-CE (AICE)-dependent manner. Our data explain how a single mutation causes a complex switch of TF specificity and gene regulation and open the perspective to specifically block the neomorphic DNA-binding activities of a mutant TF.
Collapse
Affiliation(s)
- Nikolai Schleussner
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Biology of Malignant Lymphomas, 13125, Berlin, Germany
- Hematology, Oncology, and Cancer Immunology, Charité - Universitätsmedizin Berlin, Corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin Institute of Health, 10117, Berlin, Germany
- Experimental and Clinical Research Center (ECRC), a joint cooperation between Charité and MDC, Berlin, Germany
| | - Pierre Cauchy
- Max Planck Institute of Immunobiology and Epigenetics, 79108, Freiburg, Germany
- Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham, B15 2TT, UK
- University Medical Center Freiburg, 79106, Freiburg, Germany
- German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany
| | - Vedran Franke
- Bioinformatics and Omics Data Science Platform, Berlin Institute for Medical Systems Biology, Max-Delbrück-Center, Berlin, Germany
| | - Maciej Giefing
- Institute of Human Genetics, Polish Academy of Sciences, Poznan, 60-479, Poland
- Institute of Human Genetics, Christian-Albrechts-University Kiel, 24105, Kiel, Germany
| | - Oriol Fornes
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada
| | - Naveen Vankadari
- Department of Biochemistry and Pharmacology, Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Melbourne, VIC, 3000, Australia
| | - Salam A Assi
- Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham, B15 2TT, UK
| | - Mariantonia Costanza
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Biology of Malignant Lymphomas, 13125, Berlin, Germany
- Hematology, Oncology, and Cancer Immunology, Charité - Universitätsmedizin Berlin, Corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin Institute of Health, 10117, Berlin, Germany
- Experimental and Clinical Research Center (ECRC), a joint cooperation between Charité and MDC, Berlin, Germany
| | - Marc A Weniger
- Institute of Cell Biology (Cancer Research), University of Duisburg-Essen, 45122, Essen, Germany
| | - Altuna Akalin
- Bioinformatics and Omics Data Science Platform, Berlin Institute for Medical Systems Biology, Max-Delbrück-Center, Berlin, Germany
| | - Ioannis Anagnostopoulos
- Institute of Pathology, Universität Würzburg and Comprehensive Cancer Centre Mainfranken (CCCMF), Würzburg, Germany
| | - Thomas Bukur
- TRON gGmbH - Translationale Onkologie an der Universitätsmedizin der Johannes Gutenberg-Universität Mainz, Mainz, Germany
| | - Marco G Casarotto
- Research School of Biology, The Australian National University, Canberra, ACT, Australia
| | - Frederik Damm
- Hematology, Oncology, and Cancer Immunology, Charité - Universitätsmedizin Berlin, Corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin Institute of Health, 10117, Berlin, Germany
| | - Oliver Daumke
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Structural Biology, 13125, Berlin, Germany
| | - Benjamin Edginton-White
- Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham, B15 2TT, UK
| | | | - Michael Grau
- Department of Physics, University of Marburg, 35052, Marburg, Germany
- Medical Department A for Hematology, Oncology and Pneumology, University Hospital Münster, Münster, Germany
| | - Stephan Grunwald
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Structural Biology, 13125, Berlin, Germany
| | - Martin-Leo Hansmann
- Frankfurt Institute of Advanced Studies, Frankfurt am Main, Germany
- Institute for Pharmacology and Toxicology, Goethe University, Frankfurt am Main, Germany
| | - Sylvia Hartmann
- Dr. Senckenberg Institute of Pathology, Goethe University Frankfurt, Frankfurt am Main, Germany
| | - Lionel Huber
- Max Planck Institute of Immunobiology and Epigenetics, 79108, Freiburg, Germany
| | - Eva Kärgel
- Signal Transduction in Tumor Cells, Max-Delbrück-Center for Molecular Medicine, Berlin, Germany
| | - Simone Lusatis
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Biology of Malignant Lymphomas, 13125, Berlin, Germany
- Hematology, Oncology, and Cancer Immunology, Charité - Universitätsmedizin Berlin, Corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin Institute of Health, 10117, Berlin, Germany
- Experimental and Clinical Research Center (ECRC), a joint cooperation between Charité and MDC, Berlin, Germany
| | - Daniel Noerenberg
- Hematology, Oncology, and Cancer Immunology, Charité - Universitätsmedizin Berlin, Corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin Institute of Health, 10117, Berlin, Germany
| | - Nadine Obier
- Max Planck Institute of Immunobiology and Epigenetics, 79108, Freiburg, Germany
- Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham, B15 2TT, UK
| | - Ulrich Pannicke
- Institute for Transfusion Medicine, University of Ulm, Ulm, Germany
| | - Anja Fischer
- Institute of Human Genetics, Ulm University and Ulm University Medical Center, 89081, Ulm, Germany
| | - Anja Reisser
- Department of Physics, Institute of Biophysics, Ulm University, Ulm, Germany
| | - Andreas Rosenwald
- Institute of Pathology, Universität Würzburg and Comprehensive Cancer Centre Mainfranken (CCCMF), Würzburg, Germany
| | - Klaus Schwarz
- Institute for Transfusion Medicine, University of Ulm, Ulm, Germany
- Institute for Clinical Transfusion Medicine and Immunogenetics Ulm, German Red Cross Blood Service Baden-Württemberg-Hessen, Ulm, Germany
| | - Srinivasan Sundararaj
- Research School of Biology, The Australian National University, Canberra, ACT, Australia
| | - Andre Weilemann
- Medical Department A for Hematology, Oncology and Pneumology, University Hospital Münster, Münster, Germany
| | - Wiebke Winkler
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Biology of Malignant Lymphomas, 13125, Berlin, Germany
- Hematology, Oncology, and Cancer Immunology, Charité - Universitätsmedizin Berlin, Corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin Institute of Health, 10117, Berlin, Germany
- Experimental and Clinical Research Center (ECRC), a joint cooperation between Charité and MDC, Berlin, Germany
| | - Wendan Xu
- Medical Department A for Hematology, Oncology and Pneumology, University Hospital Münster, Münster, Germany
| | - Georg Lenz
- Medical Department A for Hematology, Oncology and Pneumology, University Hospital Münster, Münster, Germany
| | - Klaus Rajewsky
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Immune Regulation and Cancer, 13125, Berlin, Germany
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada
| | - Peter N Cockerill
- Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham, B15 2TT, UK
| | - Claus Scheidereit
- Signal Transduction in Tumor Cells, Max-Delbrück-Center for Molecular Medicine, Berlin, Germany
| | - Reiner Siebert
- Institute of Human Genetics, Christian-Albrechts-University Kiel, 24105, Kiel, Germany
- Institute of Human Genetics, Ulm University and Ulm University Medical Center, 89081, Ulm, Germany
| | - Ralf Küppers
- German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany
- Institute of Cell Biology (Cancer Research), University of Duisburg-Essen, 45122, Essen, Germany
| | - Rudolf Grosschedl
- Max Planck Institute of Immunobiology and Epigenetics, 79108, Freiburg, Germany
| | - Martin Janz
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Biology of Malignant Lymphomas, 13125, Berlin, Germany
- Hematology, Oncology, and Cancer Immunology, Charité - Universitätsmedizin Berlin, Corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin Institute of Health, 10117, Berlin, Germany
- Experimental and Clinical Research Center (ECRC), a joint cooperation between Charité and MDC, Berlin, Germany
| | - Constanze Bonifer
- Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham, B15 2TT, UK
| | - Stephan Mathas
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Biology of Malignant Lymphomas, 13125, Berlin, Germany.
- Hematology, Oncology, and Cancer Immunology, Charité - Universitätsmedizin Berlin, Corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin Institute of Health, 10117, Berlin, Germany.
- Experimental and Clinical Research Center (ECRC), a joint cooperation between Charité and MDC, Berlin, Germany.
- German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany.
| |
Collapse
|
7
|
Balcı AT, Ebeid MM, Benos PV, Kostka D, Chikina M. An intrinsically interpretable neural network architecture for sequence-to-function learning. Bioinformatics 2023; 39:i413-i422. [PMID: 37387140 PMCID: PMC10311317 DOI: 10.1093/bioinformatics/btad271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Sequence-based deep learning approaches have been shown to predict a multitude of functional genomic readouts, including regions of open chromatin and RNA expression of genes. However, a major limitation of current methods is that model interpretation relies on computationally demanding post hoc analyses, and even then, one can often not explain the internal mechanics of highly parameterized models. Here, we introduce a deep learning architecture called totally interpretable sequence-to-function model (tiSFM). tiSFM improves upon the performance of standard multilayer convolutional models while using fewer parameters. Additionally, while tiSFM is itself technically a multilayer neural network, internal model parameters are intrinsically interpretable in terms of relevant sequence motifs. RESULTS We analyze published open chromatin measurements across hematopoietic lineage cell-types and demonstrate that tiSFM outperforms a state-of-the-art convolutional neural network model custom-tailored to this dataset. We also show that it correctly identifies context-specific activities of transcription factors with known roles in hematopoietic differentiation, including Pax5 and Ebf1 for B-cells, and Rorc for innate lymphoid cells. tiSFM's model parameters have biologically meaningful interpretations, and we show the utility of our approach on a complex task of predicting the change in epigenetic state as a function of developmental transition. AVAILABILITY AND IMPLEMENTATION The source code, including scripts for the analysis of key findings, can be found at https://github.com/boooooogey/ATAConv, implemented in Python.
Collapse
Affiliation(s)
- Ali Tuğrul Balcı
- Joint Carnegie Mellon University-University of Pittsburgh Program in Computational Biology, Pittsburgh, PA 15213, United States
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15213, United States
| | - Mark Maher Ebeid
- Joint Carnegie Mellon University-University of Pittsburgh Program in Computational Biology, Pittsburgh, PA 15213, United States
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15213, United States
| | - Panayiotis V Benos
- Department of Epidemiology, University of Florida, Gainesville, FL 32610, United States
| | - Dennis Kostka
- Joint Carnegie Mellon University-University of Pittsburgh Program in Computational Biology, Pittsburgh, PA 15213, United States
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15213, United States
- Department of Developmental Biology, University of Pittsburgh, Pittsburgh, PA 15213, United States
| | - Maria Chikina
- Joint Carnegie Mellon University-University of Pittsburgh Program in Computational Biology, Pittsburgh, PA 15213, United States
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15213, United States
| |
Collapse
|