1
|
Martinez KM, Wilding K, Llewellyn TR, Jacobsen DE, Montoya MM, Kubicek-Sutherland JZ, Batni S, Manore C, Mukundan H. Evaluating the factors influencing accuracy, interpretability, and reproducibility in the use of machine learning classifiers in biology to enable standardization. Sci Rep 2025; 15:16651. [PMID: 40360553 PMCID: PMC12075784 DOI: 10.1038/s41598-025-00245-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 04/27/2025] [Indexed: 05/15/2025] Open
Abstract
The complexity and variability of biological data has promoted the increased use of machine learning methods to understand processes and predict outcomes. These same features complicate reliable, reproducible, interpretable, and responsible use of such methods, resulting in questionable relevance of the derived. outcomes. Here we systematically explore challenges associated with applying machine learning to predict and understand biological processes using a well- characterized in vitro experimental system. We evaluated factors that vary while applying machine learning classifers: (1) type of biochemical signature (transcripts vs. proteins), (2) data curation methods (pre- and post-processing), and (3) choice of machine learning classifier. Using accuracy, generalizability, interpretability, and reproducibility as metrics, we found that the above factors significantly mod- ulate outcomes even within a simple model system. Our results caution against the unregulated use of machine learning methods in the biological sciences, and strongly advocate the need for data standards and validation tool-kits for such studies.
Collapse
Affiliation(s)
- Kaitlyn M Martinez
- A-1 Information Systems and Modeling, Los Alamos National Laboratory, Los Alamos, NM, United States of America
| | - Kristen Wilding
- T-6 Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, NM, United States of America
| | - Trent R Llewellyn
- C-PCS Physical Chemistry and Applied Spectroscopy, Los Alamos National Laboratory, Los Alamos, NM, United States of America
| | - Daniel E Jacobsen
- C-PCS Physical Chemistry and Applied Spectroscopy, Los Alamos National Laboratory, Los Alamos, NM, United States of America
| | - Makaela M Montoya
- C-PCS Physical Chemistry and Applied Spectroscopy, Los Alamos National Laboratory, Los Alamos, NM, United States of America
| | - Jessica Z Kubicek-Sutherland
- C-PCS Physical Chemistry and Applied Spectroscopy, Los Alamos National Laboratory, Los Alamos, NM, United States of America
| | - Sweta Batni
- Defense Threat Reduction Agency, Fort Belvoir, VA, USA
| | - Carrie Manore
- T-6 Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, NM, United States of America
| | - Harshini Mukundan
- C-PCS Physical Chemistry and Applied Spectroscopy, Los Alamos National Laboratory, Los Alamos, NM, United States of America.
- Bioscience Area, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| |
Collapse
|
2
|
Kenter A, Singh H. An era of immunological discoveries heralded by molecular biology. Trends Immunol 2025; 46:364-371. [PMID: 40240192 DOI: 10.1016/j.it.2025.03.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2025] [Revised: 03/10/2025] [Accepted: 03/19/2025] [Indexed: 04/18/2025]
Abstract
The Molecular Mechanisms of Immune Cell Development and Function (MMICDF) meeting sponsored by the Federation of American Societies of Experimental Biology (FASEB) occupies a special niche because of its focus on the molecular mechanisms that underpin immunological processes. This biennial meeting with small groupings of participants and interactive nature has provided a forum for intense, informative, and influential scientific discussions. The meeting is unique for its focus on molecular mechanisms that control the exceptional processes of DNA recombination, somatic hypermutation (SHM), and gene expression during immune cell development, activation, and differentiation. The organizers of the foundational meeting reflect on the coalescence of scientific advances that catalyzed its origin, review meeting highlights to celebrate its 20th anniversary, and project into the future.
Collapse
Affiliation(s)
- Amy Kenter
- Department of Microbiology and Immunology, University of Illinois College of Medicine, Chicago, IL, USA.
| | - Harinder Singh
- Center for Systems Immunology, University of Pittsburgh, Pittsburgh, PA, USA; Department of Immunology, University of Pittsburgh, Pittsburgh, PA, USA; Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
3
|
Smaruj PN, Xiao Y, Fudenberg G. Recipes and ingredients for deep learning models of 3D genome folding. Curr Opin Genet Dev 2025; 91:102308. [PMID: 39862604 PMCID: PMC11867851 DOI: 10.1016/j.gde.2024.102308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Revised: 12/19/2024] [Accepted: 12/31/2024] [Indexed: 01/27/2025]
Abstract
Three-dimensional genome folding plays roles in gene regulation and disease. In this review, we compare and contrast recent deep learning models for predicting genome contact maps. We survey preprocessing, architecture, training, evaluation, and interpretation methods, highlighting the capabilities and limitations of different models. In each area, we highlight challenges, opportunities, and potential future directions for genome-folding models.
Collapse
Affiliation(s)
- Paulina N Smaruj
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Yao Xiao
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Geoffrey Fudenberg
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
4
|
Hao R, Ao X, Xu Y, Gao M, Jia C, Dong X, Cirenluobu, Shang P, Ye Y, Wei Z. Enhancing oxygen utilization and mitigating oxidative stress in Tibetan chickens for adaptation to high-altitude hypoxia. Poult Sci 2025; 104:104893. [PMID: 40014967 PMCID: PMC11910141 DOI: 10.1016/j.psj.2025.104893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2024] [Revised: 02/06/2025] [Accepted: 02/06/2025] [Indexed: 03/01/2025] Open
Abstract
Tibetan chicken (TBC) is one of the native poultry species that is well adapted to the high-altitude environment of the Qinghai-Tibet Plateau. To elucidate the genetic mechanisms underlying adaptation, the transcriptomes of five tissues (heart (HE), lung (LU), liver (LI), ovary (OV), and abdominal fat (AB)) were compared between TBCs and Roman chickens (RMCs) inhabiting the plateau for one year. Moreover, weighted gene co-expression network analysis (WGCNA) was applied to detect tissue-associated modules and hub genes. A total of 1105, 239, 400, 483, and 275 differentially expressed genes (DEGs) were identified in the LI, HE, LU, AB, and OV tissues, respectively. Fifteen tissue-specific modules were identified in TBC and thirteen in RMC. Analysis of transcription factor (TF) binding sites revealed nineteen hub TFs in TBC and twenty in RMC across the pool of hub genes in these two breeds. Functional enrichment analyses demonstrated that TBC exhibited robust capacity for oxygen transport, heme binding, oxidative phosphorylation, and antioxidant responses in high-altitude regions. Further investigation of the function of hub TFs indicated the involvement of ATF4, CEBPA, TCF7L1, and GFI1B in improving oxygen transport in TBCs. These hub TFs were associated with angiogenesis or hematopoiesis and likely linked to various regulatory functions and facilitate communication across multiple tissues. In conclusion, TBCs have developed a systemic adaptive mechanism to cope with high altitudes, involving the coordinated transcriptional regulation in multi-tissues to enhance oxygen transport and utilization, along with amelioration of oxidative stress.
Collapse
Affiliation(s)
- Ruidong Hao
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, PR China
| | - Xianpei Ao
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, PR China
| | - Yijing Xu
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, PR China
| | - Mengyu Gao
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, PR China
| | - Cunling Jia
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, PR China
| | - Xianggui Dong
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, PR China
| | - Cirenluobu
- Institute of Animal Husbandry and Veterinary, Tibet Academy of Agricultural and Animal Husbandry Sciences, Lhasa 860000, PR China
| | - Peng Shang
- College of Animal Science, Tibet Agriculture and Animal Husbandry University, Nyingchi, Tibet 860000, PR China
| | - Yourong Ye
- College of Animal Science, Tibet Agriculture and Animal Husbandry University, Nyingchi, Tibet 860000, PR China
| | - Zehui Wei
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, PR China.
| |
Collapse
|
5
|
Qiu W, Dincer AB, Janizek JD, Celik S, Pittet MJ, Naxerova K, Lee SI. Deep profiling of gene expression across 18 human cancers. Nat Biomed Eng 2025; 9:333-355. [PMID: 39690287 DOI: 10.1038/s41551-024-01290-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 10/23/2024] [Indexed: 12/19/2024]
Abstract
Clinical and biological information in large datasets of gene expression across cancers could be tapped with unsupervised deep learning. However, difficulties associated with biological interpretability and methodological robustness have made this impractical. Here we describe an unsupervised deep-learning framework for the generation of low-dimensional latent spaces for gene-expression data from 50,211 transcriptomes across 18 human cancers. The framework, which we named DeepProfile, outperformed dimensionality-reduction methods with respect to biological interpretability and allowed us to unveil that genes that are universally important in defining latent spaces across cancer types control immune cell activation, whereas cancer-type-specific genes and pathways define molecular disease subtypes. By linking latent variables in DeepProfile to secondary characteristics of tumours, we discovered that mutation burden is closely associated with the expression of cell-cycle-related genes, and that the activity of biological pathways for DNA-mismatch repair and MHC class II antigen presentation are consistently associated with patient survival. We also found that tumour-associated macrophages are a source of survival-correlated MHC class II transcripts. Unsupervised learning can facilitate the discovery of biological insight from gene-expression data.
Collapse
Affiliation(s)
- Wei Qiu
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Ayse B Dincer
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Joseph D Janizek
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
- Medical Scientist Training Program, University of Washington, Seattle, WA, USA
| | - Safiye Celik
- Recursion Pharmaceuticals, Salt Lake City, UT, USA
| | - Mikael J Pittet
- Department of Pathology and Immunology, University of Geneva, Geneva, Switzerland
- Ludwig Institute for Cancer Research, Lausanne Branch, Lausanne, Switzerland
- Department of Oncology, Geneva University Hospitals, Geneva, Switzerland
- AGORA Cancer Research Center and Swiss Cancer Center Leman, Lausanne, Switzerland
| | - Kamila Naxerova
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
- Center for Systems Biology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
| | - Su-In Lee
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
| |
Collapse
|
6
|
Marderstein AR, Kundu S, Padhi EM, Deshpande S, Wang A, Robb E, Sun Y, Yun CM, Pomales-Matos D, Xie Y, Nachun D, Jessa S, Kundaje A, Montgomery SB. Mapping the regulatory effects of common and rare non-coding variants across cellular and developmental contexts in the brain and heart. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.18.638922. [PMID: 40027628 PMCID: PMC11870466 DOI: 10.1101/2025.02.18.638922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Whole genome sequencing has identified over a billion non-coding variants in humans, while GWAS has revealed the non-coding genome as a significant contributor to disease. However, prioritizing causal common and rare non-coding variants in human disease, and understanding how selective pressures have shaped the non-coding genome, remains a significant challenge. Here, we predicted the effects of 15 million variants with deep learning models trained on single-cell ATAC-seq across 132 cellular contexts in adult and fetal brain and heart, producing nearly two billion context-specific predictions. Using these predictions, we distinguish candidate causal variants underlying human traits and diseases and their context-specific effects. While common variant effects are more cell-type-specific, rare variants exert more cell-type-shared regulatory effects, with selective pressures particularly targeting variants affecting fetal brain neurons. To prioritize de novo mutations with extreme regulatory effects, we developed FLARE, a context-specific functional genomic model of constraint. FLARE outperformed other methods in prioritizing case mutations from autism-affected families near syndromic autism-associated genes; for example, identifying mutation outliers near CNTNAP2 that would be missed by alternative approaches. Overall, our findings demonstrate the potential of integrating single-cell maps with population genetics and deep learning-based variant effect prediction to elucidate mechanisms of development and disease-ultimately, supporting the notion that genetic contributions to neurodevelopmental disorders are predominantly rare.
Collapse
Affiliation(s)
- Andrew R. Marderstein
- Department of Pathology, Stanford University, Stanford, CA, USA
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Soumya Kundu
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Evin M. Padhi
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Salil Deshpande
- Department of Genetics, Stanford University, Stanford, CA, USA
- Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA, USA
| | - Austin Wang
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Esther Robb
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Ying Sun
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Chang M. Yun
- Department of Chemical Engineering, Stanford University, Stanford, CA, USA
| | | | - Yilin Xie
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Daniel Nachun
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Selin Jessa
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Stephen B. Montgomery
- Department of Pathology, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| |
Collapse
|
7
|
Chandra NA, Hu Y, Buenrostro JD, Mostafavi S, Sasse A. Refining the cis-regulatory grammar learned by sequence-to-activity models by increasing model resolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.24.634804. [PMID: 39975126 PMCID: PMC11838202 DOI: 10.1101/2025.01.24.634804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Chromatin accessibility can be measured genome-wide with ATAC-seq, enabling the discovery of regulatory regions that control gene expression and determine cell type. Deep genomic sequence-to-function (S2F) models link underlying genomic sequences to the measured chromatin state and identify motifs that regulate chromatin accessibility. Previously, we developed AI-TAC, a S2F model that predicts chromatin accessibility across 81 immune cell types and identifies sequence patterns that control their differential ATAC-seq signals. While AI-TAC provided valuable insights into the regulatory patterns that govern immune cell differentiation, later research established that ATAC-seq profiles (the distribution of Tn5 cuts) contain additional information about the exact location and strength of TF binding. To make use of this additional information, we developed bpAI-TAC, a multi-task neural network which models ATAC-seq at base-pair resolution across 90 immune cell types. We show that adding ATAC-profile information consistently improves predictions of differential chromatin accessibility. We also demonstrate that simultaneous learning of related cell types through multi-task modeling leads to better predictions than single task models. We then present a systematic framework for comparing how differences in model performance can be attributed to differences in what the model has learned. To understand what additional information bpAI-TAC gleans from ATAC-profiles, we use sequence attributions and identify motifs that have different effect sizes when trained on profiles. We conclude that modeling ATAC-seq at base-pair resolution enables the model to learn a more sensitive representation of the regulatory syntax that drives differences between immunocytes, and therefore will improve predictions of variant effects.
Collapse
Affiliation(s)
- Nuria Alina Chandra
- Paul G. Allen School of Computer Science and Engineering, University of Washington, WA, USA, 98195
| | - Yan Hu
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, 02142 USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, 02138 USA
| | - Jason D. Buenrostro
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, 02142 USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, 02138 USA
| | - Sara Mostafavi
- Paul G. Allen School of Computer Science and Engineering, University of Washington, WA, USA, 98195
- Canadian Institute for Advanced Research, Toronto, ON, Canada, MG51ZB
| | - Alexander Sasse
- Paul G. Allen School of Computer Science and Engineering, University of Washington, WA, USA, 98195
- Heidelberg University, Heidelberg, Germany, 69120
- Center for Molecular Biology Heidelberg (ZMBH), Heidelberg, Germany, 69120
- Center for Synthetic Genomics, Heidelberg, Germany, 69120
| |
Collapse
|
8
|
Pampari A, Shcherbina A, Kvon EZ, Kosicki M, Nair S, Kundu S, Kathiria AS, Risca VI, Kuningas K, Alasoo K, Greenleaf WJ, Pennacchio LA, Kundaje A. ChromBPNet: bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor footprints and regulatory variants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.12.25.630221. [PMID: 39829783 PMCID: PMC11741299 DOI: 10.1101/2024.12.25.630221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2025]
Abstract
Despite extensive mapping of cis-regulatory elements (cREs) across cellular contexts with chromatin accessibility assays, the sequence syntax and genetic variants that regulate transcription factor (TF) binding and chromatin accessibility at context-specific cREs remain elusive. We introduce ChromBPNet, a deep learning DNA sequence model of base-resolution accessibility profiles that detects, learns and deconvolves assay-specific enzyme biases from regulatory sequence determinants of accessibility, enabling robust discovery of compact TF motif lexicons, cooperative motif syntax and precision footprints across assays and sequencing depths. Extensive benchmarks show that ChromBPNet, despite its lightweight design, is competitive with much larger contemporary models at predicting variant effects on chromatin accessibility, pioneer TF binding and reporter activity across assays, cell contexts and ancestry, while providing interpretation of disrupted regulatory syntax. ChromBPNet also helps prioritize and interpret regulatory variants that influence complex traits and rare diseases, thereby providing a powerful lens to decode regulatory DNA and genetic variation.
Collapse
Affiliation(s)
- Anusri Pampari
- Department of Computer Science, Stanford University, Stanford CA, 94305
| | - Anna Shcherbina
- Department of Biomedical Data Sciences, Stanford University, Stanford CA, 94305
| | - Evgeny Z. Kvon
- Department of Developmental and Cell Biology, University of California, Irvine, CA 92697, USA
| | - Michael Kosicki
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Surag Nair
- Department of Computer Science, Stanford University, Stanford CA, 94305
| | - Soumya Kundu
- Department of Computer Science, Stanford University, Stanford CA, 94305
| | | | | | | | - Kaur Alasoo
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - William James Greenleaf
- Department of Genetics, Stanford University, Stanford CA, 94305
- Department of Applied Physics, Stanford University, Stanford, California 94305, USA
| | - Len A. Pennacchio
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford CA, 94305
- Department of Genetics, Stanford University, Stanford CA, 94305
| |
Collapse
|
9
|
Chung HK, Liu C, Jambor AN, Riesenberg BP, Sun M, Casillas E, Chick B, Wang A, Wang J, Ma S, Mcdonald B, He P, Yang Q, Chen T, Varanasi SK, LaPorte M, Mann TH, Chen D, Hoffmann F, Tripple V, Ho J, Modliszewski J, Williams A, Cho UH, Liu L, Wang Y, Hargreaves DC, Thaxton JE, Kaech SM, Wang W. Multi-Omics Atlas-Assisted Discovery of Transcription Factors for Selective T Cell State Programming. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2023.01.03.522354. [PMID: 36711632 PMCID: PMC9881845 DOI: 10.1101/2023.01.03.522354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Transcription factors (TFs) regulate the differentiation of T cells into diverse states with distinct functionalities. To precisely program desired T cell states in viral infections and cancers, we generated a comprehensive transcriptional and epigenetic atlas of nine CD8 + T cell differentiation states for TF activity prediction. Our analysis catalogued TF activity fingerprints of each state, uncovering new regulatory mechanisms that govern selective cell state differentiation. Leveraging this platform, we focused on two critical T cell states in tumor and virus control: terminally exhausted T cells (TEX term ), which are dysfunctional, and tissue-resident memory T cells (T RM ), which are protective. Despite their functional differences, these states share significant transcriptional and anatomical similarities, making it both challenging and essential to engineer T cells that avoid TEX term differentiation while preserving beneficial T RM characteristics. Through in vivo CRISPR screening combined with single-cell RNA sequencing (Perturb-seq), we validated the specific TFs driving the TEX term state and confirmed the accuracy of TF specificity predictions. Importantly, we discovered novel TEX term -specific TFs such as ZSCAN20, JDP2, and ZFP324. The deletion of these TEX term -specific TFs in T cells enhanced tumor control and synergized with immune checkpoint blockade. Additionally, this study identified multi-state TFs like HIC1 and GFI1, which are vital for both TEX term and T RM states. Furthermore, our global TF community analysis and Perturb-seq experiments revealed how TFs differentially regulate key processes in T RM and TEX term cells, uncovering new biological pathways like protein catabolism that are specifically linked to TEX term differentiation. In summary, our platform systematically identifies TF programs across diverse T cell states, facilitating the engineering of specific T cell states to improve tumor control and providing insights into the cellular mechanisms underlying their functional disparities.
Collapse
|
10
|
Tian Y, Wu X, Luo S, Xiong D, Liu R, Hu L, Yuan Y, Shi G, Yao J, Huang Z, Fu F, Yang X, Tang Z, Zhang J, Hu K. A multi-omic single-cell landscape of cellular diversification in the developing human cerebral cortex. Comput Struct Biotechnol J 2024; 23:2173-2189. [PMID: 38827229 PMCID: PMC11141146 DOI: 10.1016/j.csbj.2024.05.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 05/09/2024] [Accepted: 05/13/2024] [Indexed: 06/04/2024] Open
Abstract
The vast neuronal diversity in the human neocortex is vital for high-order brain functions, necessitating elucidation of the regulatory mechanisms underlying such unparalleled diversity. However, recent studies have yet to comprehensively reveal the diversity of neurons and the molecular logic of neocortical origin in humans at single-cell resolution through profiling transcriptomic or epigenomic landscapes, owing to the application of unimodal data alone to depict exceedingly heterogeneous populations of neurons. In this study, we generated a comprehensive compendium of the developing human neocortex by simultaneously profiling gene expression and open chromatin from the same cell. We computationally reconstructed the differentiation trajectories of excitatory projection neurons of cortical origin and inferred the regulatory logic governing lineage bifurcation decisions for neuronal diversification. We demonstrated that neuronal diversity arises from progenitor cell lineage specificity and postmitotic differentiation at distinct stages. Our data paves the way for understanding the primarily coordinated regulatory logic for neuronal diversification in the neocortex.
Collapse
Affiliation(s)
- Yuhan Tian
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510275, China
| | - Xia Wu
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510275, China
| | - Songhao Luo
- School of Mathematics, Sun Yat-sen University, Guangzhou 510275, China
| | - Dan Xiong
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510275, China
| | - Rong Liu
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510275, China
| | - Lanqi Hu
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510275, China
| | - Yuchen Yuan
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510275, China
| | - Guowei Shi
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510275, China
| | - Junjie Yao
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510275, China
| | - Zhiwei Huang
- School of Mathematics, Sun Yat-sen University, Guangzhou 510275, China
| | - Fang Fu
- Guangzhou Women and Children’s Medical Center, Guangzhou Medical University, Guangzhou 511436, China
| | - Xin Yang
- Guangzhou Women and Children’s Medical Center, Guangzhou Medical University, Guangzhou 511436, China
| | - Zhonghui Tang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510275, China
| | - Jiajun Zhang
- School of Mathematics, Sun Yat-sen University, Guangzhou 510275, China
| | - Kunhua Hu
- Guangdong Provincial Key Laboratory of Brain Function and Disease, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510275, China
- Public Platform Laboratory, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou 510630, China
| |
Collapse
|
11
|
Kenry. Machine-learning-guided quantitative delineation of cell morphological features and responses to nanomaterials. NANOSCALE 2024; 16:19656-19668. [PMID: 39373030 DOI: 10.1039/d4nr02466d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/08/2024]
Abstract
Delineation of cell morphological features is essential to decipher cell responses to external stimuli like theranostic nanomaterials. Conventional methods rely on labeled approaches, such as fluorescence imaging and flow cytometry, to assess cell responses. Besides potentially perturbing cell structure and morphology, these approaches are relatively complex, time-consuming, expensive, and may not be compatible with downstream analysis involving live cells. Herein, leveraging label-free phase-contrast or brightfield microscopy imaging and machine learning, the delineation of different cell types, phenotypes, and states for monitoring live cell responses is reported. Notably, pixel classification based on a supervised random forest classifier is used to distinguish between cells and backgrounds from the microscopy images, followed by cell segmentation and morphological feature extraction. Quantitative analysis shows that most of the compared cell groups have distinguishable size and shape features. Principal component analysis and unsupervised k-means clustering of morphological features reveal the possible existence of heterogenous cell subpopulations and treatment responses among the seemingly homogenous cell groups. This shows the merit of the reported approach in complementing conventional techniques for cell analysis. It is anticipated that the demonstrated method will further aid the implementation of machine learning to streamline the analysis of cell morphology and responses for early disease diagnosis and treatment response monitoring.
Collapse
Affiliation(s)
- Kenry
- Department of Pharmacology and Toxicology, R. Ken Coit College of Pharmacy, University of Arizona, Tucson, AZ 85721, USA.
- University of Arizona Cancer Center, University of Arizona, Tucson, AZ 85721, USA
- BIO5 Institute, University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
12
|
Qiu W, Dincer AB, Janizek JD, Celik S, Pittet M, Naxerova K, Lee SI. A deep profile of gene expression across 18 human cancers. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.17.585426. [PMID: 38559197 PMCID: PMC10980029 DOI: 10.1101/2024.03.17.585426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Clinically and biologically valuable information may reside untapped in large cancer gene expression data sets. Deep unsupervised learning has the potential to extract this information with unprecedented efficacy but has thus far been hampered by a lack of biological interpretability and robustness. Here, we present DeepProfile, a comprehensive framework that addresses current challenges in applying unsupervised deep learning to gene expression profiles. We use DeepProfile to learn low-dimensional latent spaces for 18 human cancers from 50,211 transcriptomes. DeepProfile outperforms existing dimensionality reduction methods with respect to biological interpretability. Using DeepProfile interpretability methods, we show that genes that are universally important in defining the latent spaces across all cancer types control immune cell activation, while cancer type-specific genes and pathways define molecular disease subtypes. By linking DeepProfile latent variables to secondary tumor characteristics, we discover that tumor mutation burden is closely associated with the expression of cell cycle-related genes. DNA mismatch repair and MHC class II antigen presentation pathway expression, on the other hand, are consistently associated with patient survival. We validate these results through Kaplan-Meier analyses and nominate tumor-associated macrophages as an important source of survival-correlated MHC class II transcripts. Our results illustrate the power of unsupervised deep learning for discovery of cancer biology from existing gene expression data.
Collapse
Affiliation(s)
- Wei Qiu
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA
| | - Ayse B. Dincer
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA
| | - Joseph D. Janizek
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA
- Medical Scientist Training Program, University of Washington, Seattle, WA
| | | | - Mikael Pittet
- Department of Pathology and Immunology, University of Geneva, Switzerland
- Ludwig Institute for Cancer Research, Lausanne Branch, Switzerland
| | - Kamila Naxerova
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Center for Systems Biology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Su-In Lee
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA
| |
Collapse
|
13
|
Awdeh A, Turcotte M, Perkins TJ. Identifying transcription factors with cell-type specific DNA binding signatures. BMC Genomics 2024; 25:957. [PMID: 39402535 PMCID: PMC11472444 DOI: 10.1186/s12864-024-10859-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Accepted: 10/02/2024] [Indexed: 10/19/2024] Open
Abstract
BACKGROUND Transcription factors (TFs) bind to different parts of the genome in different types of cells, but it is usually assumed that the inherent DNA-binding preferences of a TF are invariant to cell type. Yet, there are several known examples of TFs that switch their DNA-binding preferences in different cell types, and yet more examples of other mechanisms, such as steric hindrance or cooperative binding, that may result in a "DNA signature" of differential binding. RESULTS To survey this phenomenon systematically, we developed a deep learning method we call SigTFB (Signatures of TF Binding) to detect and quantify cell-type specificity in a TF's known genomic binding sites. We used ENCODE ChIP-seq data to conduct a wide scale investigation of 169 distinct TFs in up to 14 distinct cell types. SigTFB detected statistically significant DNA binding signatures in approximately two-thirds of TFs, far more than might have been expected from the relatively sparse evidence in prior literature. We found that the presence or absence of a cell-type specific DNA binding signature is distinct from, and indeed largely uncorrelated to, the degree of overlap between ChIP-seq peaks in different cell types, and tended to arise by two mechanisms: using established motifs in different frequencies, and by selective inclusion of motifs for distint TFs. CONCLUSIONS While recent results have highlighted cell state features such as chromatin accessibility and gene expression in predicting TF binding, our results emphasize that, for some TFs, the DNA sequences of the binding sites contain substantial cell-type specific motifs.
Collapse
Affiliation(s)
- Aseel Awdeh
- School of Electrical Engineering and Compute Science, University of Ottawa, 800 King Edward Ave., Ottawa, K1N 6N5, Ontario, Canada
- Regenerative Medicine Program, Ottawa Hospital Research Institute, 501 Smyth Rd., Ottawa, K1H 8L6, Ontario, Canada
| | - Marcel Turcotte
- School of Electrical Engineering and Compute Science, University of Ottawa, 800 King Edward Ave., Ottawa, K1N 6N5, Ontario, Canada
| | - Theodore J Perkins
- School of Electrical Engineering and Compute Science, University of Ottawa, 800 King Edward Ave., Ottawa, K1N 6N5, Ontario, Canada.
- Regenerative Medicine Program, Ottawa Hospital Research Institute, 501 Smyth Rd., Ottawa, K1H 8L6, Ontario, Canada.
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa, 451 Smyth Rd., Ottawa, K1H 8M5, Ontario, Canada.
| |
Collapse
|
14
|
Sasse A, Chikina M, Mostafavi S. Quick and effective approximation of in silico saturation mutagenesis experiments with first-order taylor expansion. iScience 2024; 27:110807. [PMID: 39286491 PMCID: PMC11404212 DOI: 10.1016/j.isci.2024.110807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 08/08/2024] [Accepted: 08/20/2024] [Indexed: 09/19/2024] Open
Abstract
To understand the decision process of genomic sequence-to-function models, explainable AI algorithms determine the importance of each nucleotide in a given input sequence to the model's predictions and enable discovery of cis-regulatory motifs for gene regulation. The most commonly applied method is in silico saturation mutagenesis (ISM) because its per-nucleotide importance scores can be intuitively understood as the computational counterpart to in vivo saturation mutagenesis experiments. While ISM is highly interpretable, it is computationally challenging to perform for many sequences, and becomes prohibitive as the length of the input sequences and size of the model grows. Here, we use the first-order Taylor approximation to approximate ISM values from the model's gradient, which reduces its computation cost to a single forward pass for an input sequence. We show that the Taylor ISM (TISM) approximation is robust across different model ablations, random initializations, training parameters, and dataset sizes.
Collapse
Affiliation(s)
- Alexander Sasse
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - Maria Chikina
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 16354, USA
| | - Sara Mostafavi
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
- Canadian Institute for Advanced Research, Toronto, ON MG5 1ZB, Canada
| |
Collapse
|
15
|
Liu J, Castillo-Hair SM, Du LY, Wang Y, Carte AN, Colomer-Rosell M, Yin C, Seelig G, Schier AF. Dissecting the regulatory logic of specification and differentiation during vertebrate embryogenesis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.27.609971. [PMID: 39253514 PMCID: PMC11383055 DOI: 10.1101/2024.08.27.609971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
The interplay between transcription factors and chromatin accessibility regulates cell type diversification during vertebrate embryogenesis. To systematically decipher the gene regulatory logic guiding this process, we generated a single-cell multi-omics atlas of RNA expression and chromatin accessibility during early zebrafish embryogenesis. We developed a deep learning model to predict chromatin accessibility based on DNA sequence and found that a small number of transcription factors underlie cell-type-specific chromatin landscapes. While Nanog is well-established in promoting pluripotency, we discovered a new function in priming the enhancer accessibility of mesendodermal genes. In addition to the classical stepwise mode of differentiation, we describe instant differentiation, where pluripotent cells skip intermediate fate transitions and terminally differentiate. Reconstruction of gene regulatory interactions reveals that this process is driven by a shallow network in which maternally deposited regulators activate a small set of transcription factors that co-regulate hundreds of differentiation genes. Notably, misexpression of these transcription factors in pluripotent cells is sufficient to ectopically activate their targets. This study provides a rich resource for analyzing embryonic gene regulation and reveals the regulatory logic of instant differentiation.
Collapse
Affiliation(s)
- Jialin Liu
- Biozentrum, University of Basel, Basel, 4056, Switzerland
- Allen Discovery Center for Cell Lineage Tracing, University of Washington, Seattle, WA, 98195, USA
| | | | - Lucia Y. Du
- Biozentrum, University of Basel, Basel, 4056, Switzerland
- Allen Discovery Center for Cell Lineage Tracing, University of Washington, Seattle, WA, 98195, USA
| | - Yiqun Wang
- Biozentrum, University of Basel, Basel, 4056, Switzerland
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, UCSD, La Jolla, CA, 92037, USA
| | - Adam N. Carte
- Biozentrum, University of Basel, Basel, 4056, Switzerland
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, 02115, USA
| | - Mariona Colomer-Rosell
- Biozentrum, University of Basel, Basel, 4056, Switzerland
- Allen Discovery Center for Cell Lineage Tracing, University of Washington, Seattle, WA, 98195, USA
| | - Christopher Yin
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA, 98195, USA
| | - Georg Seelig
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA, 98195, USA
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, 98195, USA
| | - Alexander F. Schier
- Biozentrum, University of Basel, Basel, 4056, Switzerland
- Allen Discovery Center for Cell Lineage Tracing, University of Washington, Seattle, WA, 98195, USA
| |
Collapse
|
16
|
Zhu T, Xia C, Yu R, Zhou X, Xu X, Wang L, Zong Z, Yang J, Liu Y, Ming L, You Y, Chen D, Xie W. Comprehensive mapping and modelling of the rice regulome landscape unveils the regulatory architecture underlying complex traits. Nat Commun 2024; 15:6562. [PMID: 39095348 PMCID: PMC11297339 DOI: 10.1038/s41467-024-50787-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 07/19/2024] [Indexed: 08/04/2024] Open
Abstract
Unraveling the regulatory mechanisms that govern complex traits is pivotal for advancing crop improvement. Here we present a comprehensive regulome atlas for rice (Oryza sativa), charting the chromatin accessibility across 23 distinct tissues from three representative varieties. Our study uncovers 117,176 unique open chromatin regions (OCRs), accounting for ~15% of the rice genome, a notably higher proportion compared to previous reports in plants. Integrating RNA-seq data from matched tissues, we confidently predict 59,075 OCR-to-gene links, with enhancers constituting 69.54% of these associations, including many known enhancer-to-gene links. Leveraging this resource, we re-evaluate genome-wide association study results and discover a previously unknown function of OsbZIP06 in seed germination, which we subsequently confirm through experimental validation. We optimize deep learning models to decode regulatory grammar, achieving robust modeling of tissue-specific chromatin accessibility. This approach allows to predict cross-variety regulatory dynamics from genomic sequences, shedding light on the genetic underpinnings of cis-regulatory divergence and morphological disparities between varieties. Overall, our study establishes a foundational resource for rice functional genomics and precision molecular breeding, providing valuable insights into regulatory mechanisms governing complex traits.
Collapse
Affiliation(s)
- Tao Zhu
- State Key Laboratory of Pharmaceutical Biotechnology, Department of Gastroenterology, Nanjing Drum Tower Hospital, National Resource Center for Mutant Mice, School of Life Sciences, Nanjing University, Nanjing, 210023, China
- Chemistry and Biomedicine Innovation Center, Nanjing University, Nanjing, 210023, China
| | - Chunjiao Xia
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
| | - Ranran Yu
- State Key Laboratory of Pharmaceutical Biotechnology, Department of Gastroenterology, Nanjing Drum Tower Hospital, National Resource Center for Mutant Mice, School of Life Sciences, Nanjing University, Nanjing, 210023, China
| | - Xinkai Zhou
- State Key Laboratory of Pharmaceutical Biotechnology, Department of Gastroenterology, Nanjing Drum Tower Hospital, National Resource Center for Mutant Mice, School of Life Sciences, Nanjing University, Nanjing, 210023, China
| | - Xingbing Xu
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
| | - Lin Wang
- State Key Laboratory of Pharmaceutical Biotechnology, Department of Gastroenterology, Nanjing Drum Tower Hospital, National Resource Center for Mutant Mice, School of Life Sciences, Nanjing University, Nanjing, 210023, China
| | - Zhanxiang Zong
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
| | - Junjiao Yang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yinmeng Liu
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
| | - Luchang Ming
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yuxin You
- State Key Laboratory of Pharmaceutical Biotechnology, Department of Gastroenterology, Nanjing Drum Tower Hospital, National Resource Center for Mutant Mice, School of Life Sciences, Nanjing University, Nanjing, 210023, China
| | - Dijun Chen
- State Key Laboratory of Pharmaceutical Biotechnology, Department of Gastroenterology, Nanjing Drum Tower Hospital, National Resource Center for Mutant Mice, School of Life Sciences, Nanjing University, Nanjing, 210023, China.
- Chemistry and Biomedicine Innovation Center, Nanjing University, Nanjing, 210023, China.
| | - Weibo Xie
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China.
- Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, 430070, China.
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China.
| |
Collapse
|
17
|
Sokolova K, Chen KM, Hao Y, Zhou J, Troyanskaya OG. Deep Learning Sequence Models for Transcriptional Regulation. Annu Rev Genomics Hum Genet 2024; 25:105-122. [PMID: 38594933 DOI: 10.1146/annurev-genom-021623-024727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2024]
Abstract
Deciphering the regulatory code of gene expression and interpreting the transcriptional effects of genome variation are critical challenges in human genetics. Modern experimental technologies have resulted in an abundance of data, enabling the development of sequence-based deep learning models that link patterns embedded in DNA to the biochemical and regulatory properties contributing to transcriptional regulation, including modeling epigenetic marks, 3D genome organization, and gene expression, with tissue and cell-type specificity. Such methods can predict the functional consequences of any noncoding variant in the human genome, even rare or never-before-observed variants, and systematically characterize their consequences beyond what is tractable from experiments or quantitative genetics studies alone. Recently, the development and application of interpretability approaches have led to the identification of key sequence patterns contributing to the predicted tasks, providing insights into the underlying biological mechanisms learned and revealing opportunities for improvement in future models.
Collapse
Affiliation(s)
- Ksenia Sokolova
- Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA; , ,
| | - Kathleen M Chen
- Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA; , ,
| | - Yun Hao
- Flatiron Institute, Simons Foundation, New York, NY, USA;
| | - Jian Zhou
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, USA;
| | - Olga G Troyanskaya
- Princeton Precision Health, Princeton University, Princeton, New Jersey, USA
- Flatiron Institute, Simons Foundation, New York, NY, USA;
- Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA; , ,
| |
Collapse
|
18
|
Kathail P, Shuai RW, Chung R, Ye CJ, Loeb GB, Ioannidis NM. Current genomic deep learning models display decreased performance in cell type-specific accessible regions. Genome Biol 2024; 25:202. [PMID: 39090688 PMCID: PMC11293111 DOI: 10.1186/s13059-024-03335-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 07/10/2024] [Indexed: 08/04/2024] Open
Abstract
BACKGROUND A number of deep learning models have been developed to predict epigenetic features such as chromatin accessibility from DNA sequence. Model evaluations commonly report performance genome-wide; however, cis regulatory elements (CREs), which play critical roles in gene regulation, make up only a small fraction of the genome. Furthermore, cell type-specific CREs contain a large proportion of complex disease heritability. RESULTS We evaluate genomic deep learning models in chromatin accessibility regions with varying degrees of cell type specificity. We assess two modeling directions in the field: general purpose models trained across thousands of outputs (cell types and epigenetic marks) and models tailored to specific tissues and tasks. We find that the accuracy of genomic deep learning models, including two state-of-the-art general purpose models-Enformer and Sei-varies across the genome and is reduced in cell type-specific accessible regions. Using accessibility models trained on cell types from specific tissues, we find that increasing model capacity to learn cell type-specific regulatory syntax-through single-task learning or high capacity multi-task models-can improve performance in cell type-specific accessible regions. We also observe that improving reference sequence predictions does not consistently improve variant effect predictions, indicating that novel strategies are needed to improve performance on variants. CONCLUSIONS Our results provide a new perspective on the performance of genomic deep learning models, showing that performance varies across the genome and is particularly reduced in cell type-specific accessible regions. We also identify strategies to maximize performance in cell type-specific accessible regions.
Collapse
Affiliation(s)
- Pooja Kathail
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA.
| | - Richard W Shuai
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA
| | - Ryan Chung
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Chun Jimmie Ye
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, CA, USA
- Department of Epidemiology and Biostatistics, University of California, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, USA
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Gabriel B Loeb
- Division of Nephrology, Department of Medicine, University of California, San Francisco, CA, USA.
- Cardiovascular Research Institute, University of California, San Francisco, CA, USA.
| | - Nilah M Ioannidis
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA.
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA.
- Chan Zuckerberg Biohub, San Francisco, CA, USA.
| |
Collapse
|
19
|
Kathail P, Shuai RW, Chung R, Ye CJ, Loeb GB, Ioannidis NM. Current genomic deep learning models display decreased performance in cell type specific accessible regions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.05.602265. [PMID: 39026761 PMCID: PMC11257480 DOI: 10.1101/2024.07.05.602265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
Background A number of deep learning models have been developed to predict epigenetic features such as chromatin accessibility from DNA sequence. Model evaluations commonly report performance genome-wide; however, cis regulatory elements (CREs), which play critical roles in gene regulation, make up only a small fraction of the genome. Furthermore, cell type specific CREs contain a large proportion of complex disease heritability. Results We evaluate genomic deep learning models in chromatin accessibility regions with varying degrees of cell type specificity. We assess two modeling directions in the field: general purpose models trained across thousands of outputs (cell types and epigenetic marks), and models tailored to specific tissues and tasks. We find that the accuracy of genomic deep learning models, including two state-of-the-art general purpose models - Enformer and Sei - varies across the genome and is reduced in cell type specific accessible regions. Using accessibility models trained on cell types from specific tissues, we find that increasing model capacity to learn cell type specific regulatory syntax - through single-task learning or high capacity multi-task models - can improve performance in cell type specific accessible regions. We also observe that improving reference sequence predictions does not consistently improve variant effect predictions, indicating that novel strategies are needed to improve performance on variants. Conclusions Our results provide a new perspective on the performance of genomic deep learning models, showing that performance varies across the genome and is particularly reduced in cell type specific accessible regions. We also identify strategies to maximize performance in cell type specific accessible regions.
Collapse
Affiliation(s)
- Pooja Kathail
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Richard W. Shuai
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA
| | - Ryan Chung
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Chun Jimmie Ye
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Gabriel B. Loeb
- Division of Nephrology, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
- Cardiovascular Research Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Nilah M. Ioannidis
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| |
Collapse
|
20
|
Yin C, Hair SC, Byeon GW, Bromley P, Meuleman W, Seelig G. Iterative deep learning-design of human enhancers exploits condensed sequence grammar to achieve cell type-specificity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.14.599076. [PMID: 38915713 PMCID: PMC11195158 DOI: 10.1101/2024.06.14.599076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
An important and largely unsolved problem in synthetic biology is how to target gene expression to specific cell types. Here, we apply iterative deep learning to design synthetic enhancers with strong differential activity between two human cell lines. We initially train models on published datasets of enhancer activity and chromatin accessibility and use them to guide the design of synthetic enhancers that maximize predicted specificity. We experimentally validate these sequences, use the measurements to re-optimize the predictor, and design a second generation of enhancers with improved specificity. Our design methods embed relevant transcription factor binding site (TFBS) motifs with higher frequencies than comparable endogenous enhancers while using a more selective motif vocabulary, and we show that enhancer activity is correlated with transcription factor expression at the single cell level. Finally, we characterize causal features of top enhancers via perturbation experiments and show enhancers as short as 50bp can maintain specificity.
Collapse
Affiliation(s)
- Christopher Yin
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA
| | | | - Gun Woo Byeon
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA
| | - Peter Bromley
- Altius Institute for Biomedical Sciences, Seattle, WA
| | - Wouter Meuleman
- Altius Institute for Biomedical Sciences, Seattle, WA
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA
| | - Georg Seelig
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA
| |
Collapse
|
21
|
Shi Q, Song F, Zhou X, Chen X, Cao J, Na J, Fan Y, Zhang G, Zheng L. Early Predicting Osteogenic Differentiation of Mesenchymal Stem Cells Based on Deep Learning Within One Day. Ann Biomed Eng 2024; 52:1706-1718. [PMID: 38488988 DOI: 10.1007/s10439-024-03483-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 02/24/2024] [Indexed: 03/17/2024]
Abstract
Osteogenic differentiation of mesenchymal stem cells (MSCs) is proposed to be critical for bone tissue engineering and regenerative medicine. However, the current approach for evaluating osteogenic differentiation mainly involves immunohistochemical staining of specific markers which often can be detected at day 5-7 of osteogenic inducing. Deep learning (DL) is a significant technology for realizing artificial intelligence (AI). Computer vision, a branch of AI, has been proved to achieve high-precision image recognition using convolutional neural networks (CNNs). Our goal was to train CNNs to quantitatively measure the osteogenic differentiation of MSCs. To this end, bright-field images of MSCs during early osteogenic differentiation (day 0, 1, 3, 5, and 7) were captured using a simple optical phase contrast microscope to train CNNs. The results showed that the CNNs could be trained to recognize undifferentiated cells and differentiating cells with an accuracy of 0.961 on the independent test set. In addition, we found that CNNs successfully distinguished differentiated cells at a very early stage (only 1 day). Further analysis showed that overall morphological features of MSCs were the main basis for the CNN classification. In conclusion, MSCs differentiation detection can be achieved early and accurately through simple bright-field images and DL networks, which may also provide a potential and novel method for the field of cell detection in the near future.
Collapse
Affiliation(s)
- Qiusheng Shi
- Key Laboratory of Biomechanics and Mechanobiology (Beihang University), Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, No. 37 Xueyuan Road, Haidian District, Beijing, 100191, China
| | - Fan Song
- Key Laboratory of Biomechanics and Mechanobiology (Beihang University), Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, No. 37 Xueyuan Road, Haidian District, Beijing, 100191, China
| | - Xiaocheng Zhou
- Department of Statistics, The Chinese University of Hong Kong, Sha Tin, Hong Kong SAR, China
| | - Xinyuan Chen
- Key Laboratory of Biomechanics and Mechanobiology (Beihang University), Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, No. 37 Xueyuan Road, Haidian District, Beijing, 100191, China
| | - Jingqi Cao
- Key Laboratory of Biomechanics and Mechanobiology (Beihang University), Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, No. 37 Xueyuan Road, Haidian District, Beijing, 100191, China
| | - Jing Na
- Key Laboratory of Biomechanics and Mechanobiology (Beihang University), Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, No. 37 Xueyuan Road, Haidian District, Beijing, 100191, China
| | - Yubo Fan
- Key Laboratory of Biomechanics and Mechanobiology (Beihang University), Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, No. 37 Xueyuan Road, Haidian District, Beijing, 100191, China.
| | - Guanglei Zhang
- Key Laboratory of Biomechanics and Mechanobiology (Beihang University), Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, No. 37 Xueyuan Road, Haidian District, Beijing, 100191, China.
| | - Lisha Zheng
- Key Laboratory of Biomechanics and Mechanobiology (Beihang University), Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, No. 37 Xueyuan Road, Haidian District, Beijing, 100191, China.
| |
Collapse
|
22
|
Pratt HE, Andrews G, Shedd N, Phalke N, Li T, Pampari A, Jensen M, Wen C, Consortium P, Gandal MJ, Geschwind DH, Gerstein M, Moore J, Kundaje A, Colubri A, Weng Z. Using a comprehensive atlas and predictive models to reveal the complexity and evolution of brain-active regulatory elements. SCIENCE ADVANCES 2024; 10:eadj4452. [PMID: 38781344 PMCID: PMC11114231 DOI: 10.1126/sciadv.adj4452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 04/25/2024] [Indexed: 05/25/2024]
Abstract
Most genetic variants associated with psychiatric disorders are located in noncoding regions of the genome. To investigate their functional implications, we integrate epigenetic data from the PsychENCODE Consortium and other published sources to construct a comprehensive atlas of candidate brain cis-regulatory elements. Using deep learning, we model these elements' sequence syntax and predict how binding sites for lineage-specific transcription factors contribute to cell type-specific gene regulation in various types of glia and neurons. The elements' evolutionary history suggests that new regulatory information in the brain emerges primarily via smaller sequence mutations within conserved mammalian elements rather than entirely new human- or primate-specific sequences. However, primate-specific candidate elements, particularly those active during fetal brain development and in excitatory neurons and astrocytes, are implicated in the heritability of brain-related human traits. Additionally, we introduce PsychSCREEN, a web-based platform offering interactive visualization of PsychENCODE-generated genetic and epigenetic data from diverse brain cell types in individuals with psychiatric disorders and healthy controls.
Collapse
Affiliation(s)
- Henry E. Pratt
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Gregory Andrews
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Nicole Shedd
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Nishigandha Phalke
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Tongxin Li
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
- Khoury College of Computer Science, Northeastern University, Boston, MA 02115, USA
| | - Anusri Pampari
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Matthew Jensen
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Cindy Wen
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | | | - Michael J. Gandal
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Lifespan Brain Institute, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Daniel H. Geschwind
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Institute of Precision Health, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, New Haven, CT 06520, USA
- Department of Statistics and Data Science, Yale University, New Haven, CT 06520, USA
| | - Jill Moore
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Andrés Colubri
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Zhiping Weng
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| |
Collapse
|
23
|
Duncan AG, Mitchell JA, Moses AM. Improving the performance of supervised deep learning for regulatory genomics using phylogenetic augmentation. Bioinformatics 2024; 40:btae190. [PMID: 38588559 PMCID: PMC11042905 DOI: 10.1093/bioinformatics/btae190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 01/12/2024] [Accepted: 04/05/2024] [Indexed: 04/10/2024] Open
Abstract
MOTIVATION Supervised deep learning is used to model the complex relationship between genomic sequence and regulatory function. Understanding how these models make predictions can provide biological insight into regulatory functions. Given the complexity of the sequence to regulatory function mapping (the cis-regulatory code), it has been suggested that the genome contains insufficient sequence variation to train models with suitable complexity. Data augmentation is a widely used approach to increase the data variation available for model training, however current data augmentation methods for genomic sequence data are limited. RESULTS Inspired by the success of comparative genomics, we show that augmenting genomic sequences with evolutionarily related sequences from other species, which we term phylogenetic augmentation, improves the performance of deep learning models trained on regulatory genomic sequences to predict high-throughput functional assay measurements. Additionally, we show that phylogenetic augmentation can rescue model performance when the training set is down-sampled and permits deep learning on a real-world small dataset, demonstrating that this approach improves data efficiency. Overall, this data augmentation method represents a solution for improving model performance that is applicable to many supervised deep-learning problems in genomics. AVAILABILITY AND IMPLEMENTATION The open-source GitHub repository agduncan94/phylogenetic_augmentation_paper includes the code for rerunning the analyses here and recreating the figures.
Collapse
Affiliation(s)
- Andrew G Duncan
- Cell & Systems Biology, University of Toronto, Toronto, ON M5S 3G5, Canada
| | | | - Alan M Moses
- Cell & Systems Biology, University of Toronto, Toronto, ON M5S 3G5, Canada
| |
Collapse
|
24
|
Tostado CP, Da Ong LX, Heng JJW, Miccolis C, Chia S, Seow JJW, Toh Y, DasGupta R. An AI-assisted integrated, scalable, single-cell phenomic-transcriptomic platform to elucidate intratumor heterogeneity against immune response. Bioeng Transl Med 2024; 9:e10628. [PMID: 38435825 PMCID: PMC10905538 DOI: 10.1002/btm2.10628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 11/16/2023] [Indexed: 03/05/2024] Open
Abstract
We present a novel framework combining single-cell phenotypic data with single-cell transcriptomic analysis to identify factors underpinning heterogeneity in antitumor immune response. We developed a pairwise, tumor-immune discretized interaction assay between natural killer (NK-92MI) cells and patient-derived head and neck squamous cell carcinoma (HNSCC) cell lines on a microfluidic cell-trapping platform. Furthermore we generated a deep-learning computer vision algorithm that is capable of automating the acquisition and analysis of a large, live-cell imaging data set (>1 million) of paired tumor-immune interactions spanning a time course of 24 h across multiple HNSCC lines (n = 10). Finally, we combined the response data measured by Kaplan-Meier survival analysis against NK-mediated killing with downstream single-cell transcriptomic analysis to interrogate molecular signatures associated with NK-effector response. As proof-of-concept for the proposed framework, we efficiently identified MHC class I-driven cytotoxic resistance as a key mechanism for immune evasion in nonresponders, while enhanced expression of cell adhesion molecules was found to be correlated with sensitivity against NK-mediated cytotoxicity. We conclude that this integrated, data-driven phenotypic approach holds tremendous promise in advancing the rapid identification of new mechanisms and therapeutic targets related to immune evasion and response.
Collapse
Affiliation(s)
- Christopher P. Tostado
- Genome Institute of Singapore, Laboratory of Precision Oncology and Cancer EvolutionSingaporeSingapore
- Institute for Health Innovation and Technology (iHealthtech), National University of SingaporeSingaporeSingapore
| | - Lucas Xian Da Ong
- Institute for Health Innovation and Technology (iHealthtech), National University of SingaporeSingaporeSingapore
| | - Joel Jia Wei Heng
- Genome Institute of Singapore, Laboratory of Precision Oncology and Cancer EvolutionSingaporeSingapore
| | - Carlo Miccolis
- Genome Institute of Singapore, Laboratory of Precision Oncology and Cancer EvolutionSingaporeSingapore
| | - Shumei Chia
- Genome Institute of Singapore, Laboratory of Precision Oncology and Cancer EvolutionSingaporeSingapore
| | - Justine Jia Wen Seow
- Genome Institute of Singapore, Laboratory of Precision Oncology and Cancer EvolutionSingaporeSingapore
| | - Yi‐Chin Toh
- Institute for Health Innovation and Technology (iHealthtech), National University of SingaporeSingaporeSingapore
- School of Mechanical, Medical and Process EngineeringQueensland University of TechnologyBrisbaneAustralia
- Centre for Biomedical TechnologiesQueensland University of TechnologyBrisbaneAustralia
| | - Ramanuj DasGupta
- Genome Institute of Singapore, Laboratory of Precision Oncology and Cancer EvolutionSingaporeSingapore
| |
Collapse
|
25
|
Rauluseviciute I, Riudavets-Puig R, Blanc-Mathieu R, Castro-Mondragon J, Ferenc K, Kumar V, Lemma RB, Lucas J, Chèneby J, Baranasic D, Khan A, Fornes O, Gundersen S, Johansen M, Hovig E, Lenhard B, Sandelin A, Wasserman W, Parcy F, Mathelier A. JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles. Nucleic Acids Res 2024; 52:D174-D182. [PMID: 37962376 PMCID: PMC10767809 DOI: 10.1093/nar/gkad1059] [Citation(s) in RCA: 241] [Impact Index Per Article: 241.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/20/2023] [Accepted: 10/31/2023] [Indexed: 11/15/2023] Open
Abstract
JASPAR (https://jaspar.elixir.no/) is a widely-used open-access database presenting manually curated high-quality and non-redundant DNA-binding profiles for transcription factors (TFs) across taxa. In this 10th release and 20th-anniversary update, the CORE collection has expanded with 329 new profiles. We updated three existing profiles and provided orthogonal support for 72 profiles from the previous release's UNVALIDATED collection. Altogether, the JASPAR 2024 update provides a 20% increase in CORE profiles from the previous release. A trimming algorithm enhanced profiles by removing low information content flanking base pairs, which were likely uninformative (within the capacity of the PFM models) for TFBS predictions and modelling TF-DNA interactions. This release includes enhanced metadata, featuring a refined classification for plant TFs' structural DNA-binding domains. The new JASPAR collections prompt updates to the genomic tracks of predicted TF binding sites (TFBSs) in 8 organisms, with human and mouse tracks available as native tracks in the UCSC Genome browser. All data are available through the JASPAR web interface and programmatically through its API and the updated Bioconductor and pyJASPAR packages. Finally, a new TFBS extraction tool enables users to retrieve predicted JASPAR TFBSs intersecting their genomic regions of interest.
Collapse
Affiliation(s)
- Ieva Rauluseviciute
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Rafael Riudavets-Puig
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Romain Blanc-Mathieu
- Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, IRIG-DBSCI-LPCV, 17 avenue des martyrs, F-38054, Grenoble, France
| | - Jaime A Castro-Mondragon
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Katalin Ferenc
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Vipin Kumar
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Roza Berhanu Lemma
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Jérémy Lucas
- Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, IRIG-DBSCI-LPCV, 17 avenue des martyrs, F-38054, Grenoble, France
| | - Jeanne Chèneby
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| | - Damir Baranasic
- MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK
- Division of Electronics, Ruđer Bošković Institute, Bijenička cesta, 10000 Zagreb, Croatia
| | - Aziz Khan
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Oriol Fornes
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - Sveinung Gundersen
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| | - Morten Johansen
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| | - Eivind Hovig
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, 0424 Oslo, Norway
| | - Boris Lenhard
- MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK
| | - Albin Sandelin
- Department of Biology and Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK2200 Copenhagen N, Denmark
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - François Parcy
- Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, IRIG-DBSCI-LPCV, 17 avenue des martyrs, F-38054, Grenoble, France
| | - Anthony Mathelier
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
- Department of Medical Genetics, Institute of Clinical Medicine, University of Oslo and Oslo University Hospital, Oslo, Norway
| |
Collapse
|
26
|
Pechmann S. Single-cell expression predicts neuron-specific protein homeostasis networks. Open Biol 2024; 14:230386. [PMID: 38262604 PMCID: PMC10805596 DOI: 10.1098/rsob.230386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Accepted: 11/17/2023] [Indexed: 01/25/2024] Open
Abstract
The protein homeostasis network keeps proteins in their correct shapes and avoids unwanted aggregation. In turn, the accumulation of aberrantly misfolded proteins has been directly associated with the onset of ageing-associated neurodegenerative diseases such as Alzheimer's and Parkinson's. However, a detailed and rational understanding of how protein homeostasis is achieved in health, and how it can be targeted for therapeutic intervention in diseases remains missing. Here, large-scale single-cell expression data from the Allen Brain Map are analysed to investigate the transcription regulation of the core protein homeostasis network across the human brain. Remarkably, distinct expression profiles suggest specialized protein homeostasis networks with systematic adaptations in excitatory neurons, inhibitory neurons and non-neuronal cells. Moreover, several chaperones and Ubiquitin ligases are found transcriptionally coregulated with genes important for synapse formation and maintenance, thus linking protein homeostasis to the regulation of neuronal function. Finally, evolutionary analyses highlight the conservation of an elevated interaction density in the chaperone network, suggesting that one of the most exciting aspects of chaperone action may yet be discovered in their collective action at the systems level. More generally, our work highlights the power of computational analyses for breaking down complexity and gaining complementary insights into fundamental biological problems.
Collapse
|
27
|
Sasse A, Ng B, Spiro AE, Tasaki S, Bennett DA, Gaiteri C, De Jager PL, Chikina M, Mostafavi S. Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings. Nat Genet 2023; 55:2060-2064. [PMID: 38036778 DOI: 10.1038/s41588-023-01524-6] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 09/08/2023] [Indexed: 12/02/2023]
Abstract
Deep learning methods have recently become the state of the art in a variety of regulatory genomic tasks1-6, including the prediction of gene expression from genomic DNA. As such, these methods promise to serve as important tools in interpreting the full spectrum of genetic variation observed in personal genomes. Previous evaluation strategies have assessed their predictions of gene expression across genomic regions; however, systematic benchmarking is lacking to assess their predictions across individuals, which would directly evaluate their utility as personal DNA interpreters. We used paired whole genome sequencing and gene expression from 839 individuals in the ROSMAP study7 to evaluate the ability of current methods to predict gene expression variation across individuals at varied loci. Our approach identifies a limitation of current methods to correctly predict the direction of variant effects. We show that this limitation stems from insufficiently learned sequence motif grammar and suggest new model training strategies to improve performance.
Collapse
Affiliation(s)
- Alexander Sasse
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Bernard Ng
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - Anna E Spiro
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Shinya Tasaki
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - David A Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - Christopher Gaiteri
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
| | - Philip L De Jager
- Center for Translational & Computational Neuroimmunology, Department of Neurology, and the Taub Institute for the Study of Alzheimer's Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY, USA
| | - Maria Chikina
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA.
| | - Sara Mostafavi
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
- Canadian Institute for Advanced Research, Toronto, Ontario, Canada.
| |
Collapse
|
28
|
Nair S, Ameen M, Sundaram L, Pampari A, Schreiber J, Balsubramani A, Wang YX, Burns D, Blau HM, Karakikes I, Wang KC, Kundaje A. Transcription factor stoichiometry, motif affinity and syntax regulate single-cell chromatin dynamics during fibroblast reprogramming to pluripotency. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.04.560808. [PMID: 37873116 PMCID: PMC10592962 DOI: 10.1101/2023.10.04.560808] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Ectopic expression of OCT4, SOX2, KLF4 and MYC (OSKM) transforms differentiated cells into induced pluripotent stem cells. To refine our mechanistic understanding of reprogramming, especially during the early stages, we profiled chromatin accessibility and gene expression at single-cell resolution across a densely sampled time course of human fibroblast reprogramming. Using neural networks that map DNA sequence to ATAC-seq profiles at base-resolution, we annotated cell-state-specific predictive transcription factor (TF) motif syntax in regulatory elements, inferred affinity- and concentration-dependent dynamics of Tn5-bias corrected TF footprints, linked peaks to putative target genes, and elucidated rewiring of TF-to-gene cis-regulatory networks. Our models reveal that early in reprogramming, OSK, at supraphysiological concentrations, rapidly open transient regulatory elements by occupying non-canonical low-affinity binding sites. As OSK concentration falls, the accessibility of these transient elements decays as a function of motif affinity. We find that these OSK-dependent transient elements sequester the somatic TF AP-1. This redistribution is strongly associated with the silencing of fibroblast-specific genes within individual nuclei. Together, our integrated single-cell resource and models reveal insights into the cis-regulatory code of reprogramming at unprecedented resolution, connect TF stoichiometry and motif syntax to diversification of cell fate trajectories, and provide new perspectives on the dynamics and role of transient regulatory elements in somatic silencing.
Collapse
Affiliation(s)
- Surag Nair
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Mohamed Ameen
- Department of Cancer Biology, Stanford University, Stanford, CA, USA
- Cardiovascular Institute, Stanford University, Stanford, CA, USA
- Department of Dermatology, Stanford University, Stanford, CA, USA
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA
| | | | - Anusri Pampari
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Jacob Schreiber
- Department of Genetics, Stanford University, Stanford, CA, USA
| | | | - Yu Xin Wang
- Baxter Laboratory for Stem Cell Biology, Stanford University, Stanford, CA, USA
| | - David Burns
- Baxter Laboratory for Stem Cell Biology, Stanford University, Stanford, CA, USA
| | - Helen M Blau
- Baxter Laboratory for Stem Cell Biology, Stanford University, Stanford, CA, USA
- Department of Microbiology and Immunology, Stanford University, Stanford, CA, USA
| | - Ioannis Karakikes
- Cardiovascular Institute, Stanford University, Stanford, CA, USA
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, USA
| | - Kevin C Wang
- Department of Dermatology, Stanford University, Stanford, CA, USA
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA
- Veterans Affairs Palo Alto Healthcare System, Palo Alto, CA, USA
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| |
Collapse
|
29
|
Brennan KJ, Weilert M, Krueger S, Pampari A, Liu HY, Yang AWH, Morrison JA, Hughes TR, Rushlow CA, Kundaje A, Zeitlinger J. Chromatin accessibility in the Drosophila embryo is determined by transcription factor pioneering and enhancer activation. Dev Cell 2023; 58:1898-1916.e9. [PMID: 37557175 PMCID: PMC10592203 DOI: 10.1016/j.devcel.2023.07.007] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 05/09/2023] [Accepted: 07/13/2023] [Indexed: 08/11/2023]
Abstract
Chromatin accessibility is integral to the process by which transcription factors (TFs) read out cis-regulatory DNA sequences, but it is difficult to differentiate between TFs that drive accessibility and those that do not. Deep learning models that learn complex sequence rules provide an unprecedented opportunity to dissect this problem. Using zygotic genome activation in Drosophila as a model, we analyzed high-resolution TF binding and chromatin accessibility data with interpretable deep learning and performed genetic validation experiments. We identify a hierarchical relationship between the pioneer TF Zelda and the TFs involved in axis patterning. Zelda consistently pioneers chromatin accessibility proportional to motif affinity, whereas patterning TFs augment chromatin accessibility in sequence contexts where they mediate enhancer activation. We conclude that chromatin accessibility occurs in two tiers: one through pioneering, which makes enhancers accessible but not necessarily active, and the second when the correct combination of TFs leads to enhancer activation.
Collapse
Affiliation(s)
- Kaelan J Brennan
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Melanie Weilert
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Sabrina Krueger
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Anusri Pampari
- Department of Computer Science, Stanford University, Palo Alto, CA 94305, USA
| | - Hsiao-Yun Liu
- Department of Biology, New York University, New York, NY 10003, USA
| | - Ally W H Yang
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Jason A Morrison
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Timothy R Hughes
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | | | - Anshul Kundaje
- Department of Computer Science, Stanford University, Palo Alto, CA 94305, USA; Department of Genetics, Stanford University, Palo Alto, CA 94305, USA
| | - Julia Zeitlinger
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA; Department of Pathology & Laboratory Medicine, The University of Kansas Medical Center, Kansas City, KS 66160, USA.
| |
Collapse
|
30
|
Hepkema J, Lee NK, Stewart BJ, Ruangroengkulrith S, Charoensawan V, Clatworthy MR, Hemberg M. Predicting the impact of sequence motifs on gene regulation using single-cell data. Genome Biol 2023; 24:189. [PMID: 37582793 PMCID: PMC10426127 DOI: 10.1186/s13059-023-03021-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 07/21/2023] [Indexed: 08/17/2023] Open
Abstract
The binding of transcription factors at proximal promoters and distal enhancers is central to gene regulation. Identifying regulatory motifs and quantifying their impact on expression remains challenging. Using a convolutional neural network trained on single-cell data, we infer putative regulatory motifs and cell type-specific importance. Our model, scover, explains 29% of the variance in gene expression in multiple mouse tissues. Applying scover to distal enhancers identified using scATAC-seq from the developing human brain, we identify cell type-specific motif activities in distal enhancers. Scover can identify regulatory motifs and their importance from single-cell data where all parameters and outputs are easily interpretable.
Collapse
Affiliation(s)
- Jacob Hepkema
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Nicholas Keone Lee
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- The Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK
| | - Benjamin J Stewart
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Molecular Immunity Unit, Department of Medicine, University of Cambridge, Cambridge, CB2 0QQ, UK
- Cambridge University Hospitals NHS Foundation Trust and NIHR Cambridge Biomedical Research Centre, Cambridge, CB2 0QQ, UK
| | - Siwat Ruangroengkulrith
- Department of Biochemistry, Faculty of Science, Mahidol University, Bangkok, 10400, Thailand
| | - Varodom Charoensawan
- Department of Biochemistry, Faculty of Science, Mahidol University, Bangkok, 10400, Thailand
- Integrative Computational BioScience (ICBS) Center, Mahidol University, Nakhon Pathom, 7310, Thailand
- Systems Biology of Diseases Research Unit, Faculty of Science, Mahidol University, Bangkok, 10400, Thailand
| | - Menna R Clatworthy
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Molecular Immunity Unit, Department of Medicine, University of Cambridge, Cambridge, CB2 0QQ, UK
- Cambridge University Hospitals NHS Foundation Trust and NIHR Cambridge Biomedical Research Centre, Cambridge, CB2 0QQ, UK
| | - Martin Hemberg
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK.
- The Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK.
- Gene Lay Institute of Immunology and Inflammation, Brigham and Women's Hospital, Massachusetts General Hospital, and Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
31
|
Monti R, Ohler U. Toward Identification of Functional Sequences and Variants in Noncoding DNA. Annu Rev Biomed Data Sci 2023; 6:191-210. [PMID: 37262323 DOI: 10.1146/annurev-biodatasci-122120-110102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Understanding the noncoding part of the genome, which encodes gene regulation, is necessary to identify genetic mechanisms of disease and translate findings from genome-wide association studies into actionable results for treatments and personalized care. Here we provide an overview of the computational analysis of noncoding regions, starting from gene-regulatory mechanisms and their representation in data. Deep learning methods, when applied to these data, highlight important regulatory sequence elements and predict the functional effects of genetic variants. These and other algorithms are used to predict damaging sequence variants. Finally, we introduce rare-variant association tests that incorporate functional annotations and predictions in order to increase interpretability and statistical power.
Collapse
Affiliation(s)
- Remo Monti
- Max Delbrück Center for Molecular Medicine (MDC), Helmholtz Association of German Research Centers, Berlin Institute for Medical Systems Biology (BIMSB), Berlin, Germany;
- Digital Health-Machine Learning, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | - Uwe Ohler
- Max Delbrück Center for Molecular Medicine (MDC), Helmholtz Association of German Research Centers, Berlin Institute for Medical Systems Biology (BIMSB), Berlin, Germany;
| |
Collapse
|
32
|
Chowdhary K, Benoist C. A variegated model of transcription factor function in the immune system. Trends Immunol 2023; 44:530-541. [PMID: 37258360 PMCID: PMC10332489 DOI: 10.1016/j.it.2023.05.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 04/26/2023] [Accepted: 05/01/2023] [Indexed: 06/02/2023]
Abstract
Specific combinations of transcription factors (TFs) control the gene expression programs that underlie specialized immune responses. Previous models of TF function in immunocytes had restricted each TF to a single functional categorization [e.g., lineage-defining (LDTFs) vs. signal-dependent TFs (SDTFs)] within one cell type. Synthesizing recent results, we instead propose a variegated model of immunological TF function, whereby many TFs have flexible and different roles across distinct cell states, contributing to cell phenotypic diversity. We discuss evidence in support of this variegated model, describe contextual inputs that enable TF diversification, and look to the future to imagine warranted experimental and computational tools to build quantitative and predictive models of immunocyte gene regulatory networks.
Collapse
|
33
|
Penhaskashi J, Sekimoto O, Chiappelli F. Permafrost viremia and immune tweening. Bioinformation 2023; 19:685-691. [PMID: 37885785 PMCID: PMC10598357 DOI: 10.6026/97320630019685] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 06/30/2023] [Accepted: 06/30/2023] [Indexed: 10/28/2023] Open
Abstract
The immune system, an exquisitely regulated physiological system, utilizes a wide spectrum of soluble factors and multiple cell populations and subpopulations at diverse states of maturation to monitor and protect the organism against foreign organisms. Immune surveillance is ensured by distinguishing self-antigens from self-associated with non-self (e.g., viral) peptides presented by major histocompatibility complexes (MHC). Pathology is often identified as unregulated inflammatory responses (e.g., cytokine storm), or recognizing self as a non-self entity (i.e., auto-immunity). Artificial intelligence (AI), and in particular specific machine learning (ML) paradigms (e.g., Deep Learning [DL]) proffer powerful algorithms to better understand and more accurately predict immune responses, immune regulation and homeostasis, and immune reactivity to challenges (i.e., immune allostasis) by their intrinsic ability to interpret immune parameters, pathways and events by analyzing large amounts of complex data and drawing predictive inferences (i.e., immune tweening). We propose here that DL models play an increasingly significant role in better defining and characterizing immunological surveillance to ancient and novel virus species released by thawing permafrost.
Collapse
Affiliation(s)
- Jaden Penhaskashi
- />Division of West Valley Dental Implant Center, Encino, CA 91316, USA
| | | | - Francesco Chiappelli
- />Dental Group of Sherman Oaks, CA 91403 , USA
- />Center for the Health Sciences, UCLA, Los Angeles, CA, USA
| |
Collapse
|
34
|
Balcı AT, Ebeid MM, Benos PV, Kostka D, Chikina M. An intrinsically interpretable neural network architecture for sequence-to-function learning. Bioinformatics 2023; 39:i413-i422. [PMID: 37387140 PMCID: PMC10311317 DOI: 10.1093/bioinformatics/btad271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Sequence-based deep learning approaches have been shown to predict a multitude of functional genomic readouts, including regions of open chromatin and RNA expression of genes. However, a major limitation of current methods is that model interpretation relies on computationally demanding post hoc analyses, and even then, one can often not explain the internal mechanics of highly parameterized models. Here, we introduce a deep learning architecture called totally interpretable sequence-to-function model (tiSFM). tiSFM improves upon the performance of standard multilayer convolutional models while using fewer parameters. Additionally, while tiSFM is itself technically a multilayer neural network, internal model parameters are intrinsically interpretable in terms of relevant sequence motifs. RESULTS We analyze published open chromatin measurements across hematopoietic lineage cell-types and demonstrate that tiSFM outperforms a state-of-the-art convolutional neural network model custom-tailored to this dataset. We also show that it correctly identifies context-specific activities of transcription factors with known roles in hematopoietic differentiation, including Pax5 and Ebf1 for B-cells, and Rorc for innate lymphoid cells. tiSFM's model parameters have biologically meaningful interpretations, and we show the utility of our approach on a complex task of predicting the change in epigenetic state as a function of developmental transition. AVAILABILITY AND IMPLEMENTATION The source code, including scripts for the analysis of key findings, can be found at https://github.com/boooooogey/ATAConv, implemented in Python.
Collapse
Affiliation(s)
- Ali Tuğrul Balcı
- Joint Carnegie Mellon University-University of Pittsburgh Program in Computational Biology, Pittsburgh, PA 15213, United States
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15213, United States
| | - Mark Maher Ebeid
- Joint Carnegie Mellon University-University of Pittsburgh Program in Computational Biology, Pittsburgh, PA 15213, United States
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15213, United States
| | - Panayiotis V Benos
- Department of Epidemiology, University of Florida, Gainesville, FL 32610, United States
| | - Dennis Kostka
- Joint Carnegie Mellon University-University of Pittsburgh Program in Computational Biology, Pittsburgh, PA 15213, United States
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15213, United States
- Department of Developmental Biology, University of Pittsburgh, Pittsburgh, PA 15213, United States
| | - Maria Chikina
- Joint Carnegie Mellon University-University of Pittsburgh Program in Computational Biology, Pittsburgh, PA 15213, United States
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15213, United States
| |
Collapse
|
35
|
Novakovsky G, Fornes O, Saraswat M, Mostafavi S, Wasserman WW. ExplaiNN: interpretable and transparent neural networks for genomics. Genome Biol 2023; 24:154. [PMID: 37370113 DOI: 10.1186/s13059-023-02985-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Accepted: 06/12/2023] [Indexed: 06/29/2023] Open
Abstract
Deep learning models such as convolutional neural networks (CNNs) excel in genomic tasks but lack interpretability. We introduce ExplaiNN, which combines the expressiveness of CNNs with the interpretability of linear models. ExplaiNN can predict TF binding, chromatin accessibility, and de novo motifs, achieving performance comparable to state-of-the-art methods. Its predictions are transparent, providing global (cell state level) as well as local (individual sequence level) biological insights into the data. ExplaiNN can serve as a plug-and-play platform for pretrained models and annotated position weight matrices. ExplaiNN aims to accelerate the adoption of deep learning in genomic sequence analysis by domain experts.
Collapse
Affiliation(s)
- Gherman Novakovsky
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada
| | - Oriol Fornes
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada
| | - Manu Saraswat
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Sara Mostafavi
- Paul G. Allen School of Computer Science and Engineering, University of Washington (UW), Seattle, USA
| | - Wyeth W Wasserman
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|
36
|
Janizek JD, Dincer AB, Celik S, Chen H, Chen W, Naxerova K, Lee SI. Uncovering expression signatures of synergistic drug responses via ensembles of explainable machine-learning models. Nat Biomed Eng 2023; 7:811-829. [PMID: 37127711 PMCID: PMC11149694 DOI: 10.1038/s41551-023-01034-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 04/01/2023] [Indexed: 05/03/2023]
Abstract
Machine learning may aid the choice of optimal combinations of anticancer drugs by explaining the molecular basis of their synergy. By combining accurate models with interpretable insights, explainable machine learning promises to accelerate data-driven cancer pharmacology. However, owing to the highly correlated and high-dimensional nature of transcriptomic data, naively applying current explainable machine-learning strategies to large transcriptomic datasets leads to suboptimal outcomes. Here by using feature attribution methods, we show that the quality of the explanations can be increased by leveraging ensembles of explainable machine-learning models. We applied the approach to a dataset of 133 combinations of 46 anticancer drugs tested in ex vivo tumour samples from 285 patients with acute myeloid leukaemia and uncovered a haematopoietic-differentiation signature underlying drug combinations with therapeutic synergy. Ensembles of machine-learning models trained to predict drug combination synergies on the basis of gene-expression data may improve the feature attribution quality of complex machine-learning models.
Collapse
Affiliation(s)
- Joseph D Janizek
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
- Medical Scientist Training Program, University of Washington, Seattle, WA, USA
| | - Ayse B Dincer
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Safiye Celik
- Recursion Pharmaceuticals, Salt Lake City, UT, USA
| | - Hugh Chen
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - William Chen
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Kamila Naxerova
- Center for Systems Biology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
| | - Su-In Lee
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
| |
Collapse
|
37
|
Herrera-Uribe J, Lim KS, Byrne KA, Daharsh L, Liu H, Corbett RJ, Marco G, Schroyen M, Koltes JE, Loving CL, Tuggle CK. Integrative profiling of gene expression and chromatin accessibility elucidates specific transcriptional networks in porcine neutrophils. Front Genet 2023; 14:1107462. [PMID: 37287538 PMCID: PMC10242145 DOI: 10.3389/fgene.2023.1107462] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Accepted: 04/27/2023] [Indexed: 06/09/2023] Open
Abstract
Neutrophils are vital components of the immune system for limiting the invasion and proliferation of pathogens in the body. Surprisingly, the functional annotation of porcine neutrophils is still limited. The transcriptomic and epigenetic assessment of porcine neutrophils from healthy pigs was performed by bulk RNA sequencing and transposase accessible chromatin sequencing (ATAC-seq). First, we sequenced and compared the transcriptome of porcine neutrophils with eight other immune cell transcriptomes to identify a neutrophil-enriched gene list within a detected neutrophil co-expression module. Second, we used ATAC-seq analysis to report for the first time the genome-wide chromatin accessible regions of porcine neutrophils. A combined analysis using both transcriptomic and chromatin accessibility data further defined the neutrophil co-expression network controlled by transcription factors likely important for neutrophil lineage commitment and function. We identified chromatin accessible regions around promoters of neutrophil-specific genes that were predicted to be bound by neutrophil-specific transcription factors. Additionally, published DNA methylation data from porcine immune cells including neutrophils were used to link low DNA methylation patterns to accessible chromatin regions and genes with highly enriched expression in porcine neutrophils. In summary, our data provides the first integrative analysis of the accessible chromatin regions and transcriptional status of porcine neutrophils, contributing to the Functional Annotation of Animal Genomes (FAANG) project, and demonstrates the utility of chromatin accessible regions to identify and enrich our understanding of transcriptional networks in a cell type such as neutrophils.
Collapse
Affiliation(s)
- Juber Herrera-Uribe
- Department of Animal Science, Iowa State University, Ames, IA, United States
| | - Kyu-Sang Lim
- Department of Animal Science, Iowa State University, Ames, IA, United States
- Department of Animal Resource Science, Kongju National University, Yesan, Republic of Korea
| | - Kristen A. Byrne
- USDA-Agriculture Research Service, National Animal Disease Center, Food Safety and Enteric Pathogens Research Unit, Ames, IA, United States
| | - Lance Daharsh
- Department of Animal Science, Iowa State University, Ames, IA, United States
| | - Haibo Liu
- Department of Animal Science, Iowa State University, Ames, IA, United States
| | - Ryan J. Corbett
- Department of Animal Science, Iowa State University, Ames, IA, United States
| | - Gianna Marco
- Department of Animal Science, Iowa State University, Ames, IA, United States
| | - Martine Schroyen
- Department of Animal Science, Iowa State University, Ames, IA, United States
| | - James E. Koltes
- Department of Animal Science, Iowa State University, Ames, IA, United States
| | - Crystal L. Loving
- USDA-Agriculture Research Service, National Animal Disease Center, Food Safety and Enteric Pathogens Research Unit, Ames, IA, United States
| | | |
Collapse
|
38
|
Balcı AT, Ebeid MM, Benos PV, Kostka D, Chikina M. An intrinsically interpretable neural network architecture for sequence to function learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.25.525572. [PMID: 36747873 PMCID: PMC9900791 DOI: 10.1101/2023.01.25.525572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
MOTIVATION Sequence-based deep learning approaches have been shown to predict a multitude of functional genomic readouts, including regions of open chromatin and RNA expression of genes. However, a major limitation of current methods is that model interpretation relies on computationally demanding post hoc analyses, and even then, one can often not explain the internal mechanics of highly parameterized models. Here, we introduce a deep learning architecture called tiSFM (totally interpretable sequence to function model). tiSFM improves upon the performance of standard multi-layer convolutional models while using fewer parameters. Additionally, while tiSFM is itself technically a multi-layer neural network, internal model parameters are intrinsically interpretable in terms of relevant sequence motifs. RESULTS We analyze published open chromatin measurements across hematopoietic lineage cell-types and demonstrate that tiSFM outperforms a state-of-the-art convolutional neural network model custom-tailored to this dataset. We also show that it correctly identifies context specific activities of transcription factors with known roles in hematopoietic differentiation, including Pax5 and Ebf1 for B-cells, and Rorc for innate lymphoid cells. tiSFM's model parameters have biologically meaningful interpretations, and we show the utility of our approach on a complex task of predicting the change in epigenetic state as a function of developmental transition. AVAILABILITY AND IMPLEMENTATION The source code, including scripts for the analysis of key findings, can be found at https://github.com/boooooogey/ATAConv, implemented in Python.
Collapse
Affiliation(s)
- Ali Tuğrul Balcı
- Joint Carnegie Mellon University-University of Pittsburgh Program in Computational Biology, Institution, Pittsburgh, 15213, United States and
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, 15213, Unites States and
| | - Mark Maher Ebeid
- Joint Carnegie Mellon University-University of Pittsburgh Program in Computational Biology, Institution, Pittsburgh, 15213, United States and
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, 15213, Unites States and
| | - Panayiotis V Benos
- Department of Epidemiology, University of Florida, Gainesville, 32610, Unites States
| | - Dennis Kostka
- Joint Carnegie Mellon University-University of Pittsburgh Program in Computational Biology, Institution, Pittsburgh, 15213, United States and
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, 15213, Unites States and
| | - Maria Chikina
- Joint Carnegie Mellon University-University of Pittsburgh Program in Computational Biology, Institution, Pittsburgh, 15213, United States and
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, 15213, Unites States and
| |
Collapse
|
39
|
Novakovsky G, Dexter N, Libbrecht MW, Wasserman WW, Mostafavi S. Obtaining genetics insights from deep learning via explainable artificial intelligence. Nat Rev Genet 2023; 24:125-137. [PMID: 36192604 DOI: 10.1038/s41576-022-00532-2] [Citation(s) in RCA: 119] [Impact Index Per Article: 59.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/31/2022] [Indexed: 01/24/2023]
Abstract
Artificial intelligence (AI) models based on deep learning now represent the state of the art for making functional predictions in genomics research. However, the underlying basis on which predictive models make such predictions is often unknown. For genomics researchers, this missing explanatory information would frequently be of greater value than the predictions themselves, as it can enable new insights into genetic processes. We review progress in the emerging area of explainable AI (xAI), a field with the potential to empower life science researchers to gain mechanistic insights into complex deep learning models. We discuss and categorize approaches for model interpretation, including an intuitive understanding of how each approach works and their underlying assumptions and limitations in the context of typical high-throughput biological datasets.
Collapse
Affiliation(s)
- Gherman Novakovsky
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, British Columbia, Canada.,Bioinformatics Graduate Program, University of British Columbia, Vancouver, British Columbia, Canada
| | - Nick Dexter
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, Canada.,School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Maxwell W Libbrecht
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada.
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, British Columbia, Canada.
| | - Sara Mostafavi
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA. .,Canadian Institute for Advanced Research, Toronto, Ontario, Canada.
| |
Collapse
|
40
|
Shin B, Rothenberg EV. Multi-modular structure of the gene regulatory network for specification and commitment of murine T cells. Front Immunol 2023; 14:1108368. [PMID: 36817475 PMCID: PMC9928580 DOI: 10.3389/fimmu.2023.1108368] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Accepted: 01/11/2023] [Indexed: 02/04/2023] Open
Abstract
T cells develop from multipotent progenitors by a gradual process dependent on intrathymic Notch signaling and coupled with extensive proliferation. The stages leading them to T-cell lineage commitment are well characterized by single-cell and bulk RNA analyses of sorted populations and by direct measurements of precursor-product relationships. This process depends not only on Notch signaling but also on multiple transcription factors, some associated with stemness and multipotency, some with alternative lineages, and others associated with T-cell fate. These factors interact in opposing or semi-independent T cell gene regulatory network (GRN) subcircuits that are increasingly well defined. A newly comprehensive picture of this network has emerged. Importantly, because key factors in the GRN can bind to markedly different genomic sites at one stage than they do at other stages, the genes they significantly regulate are also stage-specific. Global transcriptome analyses of perturbations have revealed an underlying modular structure to the T-cell commitment GRN, separating decisions to lose "stem-ness" from decisions to block alternative fates. Finally, the updated network sheds light on the intimate relationship between the T-cell program, which depends on the thymus, and the innate lymphoid cell (ILC) program, which does not.
Collapse
Affiliation(s)
- Boyoung Shin
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, United States
| | - Ellen V. Rothenberg
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, United States
| |
Collapse
|
41
|
Kawaguchi RK, Tang Z, Fischer S, Rajesh C, Tripathy R, Koo PK, Gillis J. Learning single-cell chromatin accessibility profiles using meta-analytic marker genes. Brief Bioinform 2023; 24:bbac541. [PMID: 36549922 PMCID: PMC9851328 DOI: 10.1093/bib/bbac541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 09/29/2022] [Accepted: 11/08/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Single-cell assay for transposase accessible chromatin using sequencing (scATAC-seq) is a valuable resource to learn cis-regulatory elements such as cell-type specific enhancers and transcription factor binding sites. However, cell-type identification of scATAC-seq data is known to be challenging due to the heterogeneity derived from different protocols and the high dropout rate. RESULTS In this study, we perform a systematic comparison of seven scATAC-seq datasets of mouse brain to benchmark the efficacy of neuronal cell-type annotation from gene sets. We find that redundant marker genes give a dramatic improvement for a sparse scATAC-seq annotation across the data collected from different studies. Interestingly, simple aggregation of such marker genes achieves performance comparable or higher than that of machine-learning classifiers, suggesting its potential for downstream applications. Based on our results, we reannotated all scATAC-seq data for detailed cell types using robust marker genes. Their meta scATAC-seq profiles are publicly available at https://gillisweb.cshl.edu/Meta_scATAC. Furthermore, we trained a deep neural network to predict chromatin accessibility from only DNA sequence and identified key motifs enriched for each neuronal subtype. Those predicted profiles are visualized together in our database as a valuable resource to explore cell-type specific epigenetic regulation in a sequence-dependent and -independent manner.
Collapse
Affiliation(s)
| | - Ziqi Tang
- Cold Spring Harbor Laboratory, Cold Spring Harbor 11724, USA
| | - Stephan Fischer
- Cold Spring Harbor Laboratory, Cold Spring Harbor 11724, USA
| | - Chandana Rajesh
- Cold Spring Harbor Laboratory, Cold Spring Harbor 11724, USA
| | - Rohit Tripathy
- Cold Spring Harbor Laboratory, Cold Spring Harbor 11724, USA
| | - Peter K Koo
- Cold Spring Harbor Laboratory, Cold Spring Harbor 11724, USA
| | - Jesse Gillis
- Cold Spring Harbor Laboratory, Cold Spring Harbor 11724, USA
- Department of Physiology and Donnelly Centre for Cellular & Biomolecular Research Department, University of Toronto, Ontario M5S 3E1, Canada
| |
Collapse
|
42
|
George S, Martin JAJ, Graziani V, Sanz-Moreno V. Amoeboid migration in health and disease: Immune responses versus cancer dissemination. Front Cell Dev Biol 2023; 10:1091801. [PMID: 36699013 PMCID: PMC9869768 DOI: 10.3389/fcell.2022.1091801] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 12/15/2022] [Indexed: 01/07/2023] Open
Abstract
Cell migration is crucial for efficient immune responses and is aberrantly used by cancer cells during metastatic dissemination. Amoeboid migrating cells use myosin II-powered blebs to propel themselves, and change morphology and direction. Immune cells use amoeboid strategies to respond rapidly to infection or tissue damage, which require quick passage through several barriers, including blood, lymph and interstitial tissues, with complex and varied environments. Amoeboid migration is also used by metastatic cancer cells to aid their migration, dissemination and survival, whereby key mechanisms are hijacked from professionally motile immune cells. We explore important parallels observed between amoeboid immune and cancer cells. We also consider key distinctions that separate the lifespan, state and fate of these cell types as they migrate and/or fulfil their function. Finally, we reflect on unexplored areas of research that would enhance our understanding of how tumour cells use immune cell strategies during metastasis, and how to target these processes.
Collapse
|
43
|
Milanese JS, Marcotte R, Costain WJ, Kablar B, Drouin S. Roles of Skeletal Muscle in Development: A Bioinformatics and Systems Biology Overview. ADVANCES IN ANATOMY, EMBRYOLOGY, AND CELL BIOLOGY 2023; 236:21-55. [PMID: 37955770 DOI: 10.1007/978-3-031-38215-4_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2023]
Abstract
The ability to assess various cellular events consequent to perturbations, such as genetic mutations, disease states and therapies, has been recently revolutionized by technological advances in multiple "omics" fields. The resulting deluge of information has enabled and necessitated the development of tools required to both process and interpret the data. While of tremendous value to basic researchers, the amount and complexity of the data has made it extremely difficult to manually draw inference and identify factors key to the study objectives. The challenges of data reduction and interpretation are being met by the development of increasingly complex tools that integrate disparate knowledge bases and synthesize coherent models based on current biological understanding. This chapter presents an example of how genomics data can be integrated with biological network analyses to gain further insight into the developmental consequences of genetic perturbations. State of the art methods for conducting similar studies are discussed along with modern methods used to analyze and interpret the data.
Collapse
Affiliation(s)
| | - Richard Marcotte
- Human Health Therapeutics, National Research Council of Canada , Montreal, QC, Canada
| | - Willard J Costain
- Human Health Therapeutics, National Research Council of Canada, Ottawa, ON, Canada
| | - Boris Kablar
- Department of Medical Neuroscience, Anatomy and Pathology, Faculty of Medicine, Dalhousie University, Halifax, NS, Canada
| | - Simon Drouin
- Human Health Therapeutics, National Research Council of Canada , Montreal, QC, Canada.
| |
Collapse
|
44
|
Alatawneh R, Salomon Y, Eshel R, Orenstein Y, Birnbaum RY. Deciphering transcription factors and their corresponding regulatory elements during inhibitory interneuron differentiation using deep neural networks. Front Cell Dev Biol 2023; 11:1034604. [PMID: 36891511 PMCID: PMC9986276 DOI: 10.3389/fcell.2023.1034604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 01/23/2023] [Indexed: 02/22/2023] Open
Abstract
During neurogenesis, the generation and differentiation of neuronal progenitors into inhibitory gamma-aminobutyric acid-containing interneurons is dependent on the combinatorial activity of transcription factors (TFs) and their corresponding regulatory elements (REs). However, the roles of neuronal TFs and their target REs in inhibitory interneuron progenitors are not fully elucidated. Here, we developed a deep-learning-based framework to identify enriched TF motifs in gene REs (eMotif-RE), such as poised/repressed enhancers and putative silencers. Using epigenetic datasets (e.g., ATAC-seq and H3K27ac/me3 ChIP-seq) from cultured interneuron-like progenitors, we distinguished between active enhancer sequences (open chromatin with H3K27ac) and non-active enhancer sequences (open chromatin without H3K27ac). Using our eMotif-RE framework, we discovered enriched motifs of TFs such as ASCL1, SOX4, and SOX11 in the active enhancer set suggesting a cooperativity function for ASCL1 and SOX4/11 in active enhancers of neuronal progenitors. In addition, we found enriched ZEB1 and CTCF motifs in the non-active set. Using an in vivo enhancer assay, we showed that most of the tested putative REs from the non-active enhancer set have no enhancer activity. Two of the eight REs (25%) showed function as poised enhancers in the neuronal system. Moreover, mutated REs for ZEB1 and CTCF motifs increased their in vivo activity as enhancers indicating a repressive effect of ZEB1 and CTCF on these REs that likely function as repressed enhancers or silencers. Overall, our work integrates a novel framework based on deep learning together with a functional assay that elucidated novel functions of TFs and their corresponding REs. Our approach can be applied to better understand gene regulation not only in inhibitory interneuron differentiation but in other tissue and cell types.
Collapse
Affiliation(s)
- Rawan Alatawneh
- Department of Life Sciences, Faculty of Natural Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel.,The Zlotowski Center for Neuroscience, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Yahel Salomon
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Reut Eshel
- Department of Life Sciences, Faculty of Natural Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel.,The Zlotowski Center for Neuroscience, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Yaron Orenstein
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel.,Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel.,The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel
| | - Ramon Y Birnbaum
- Department of Life Sciences, Faculty of Natural Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel.,The Zlotowski Center for Neuroscience, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| |
Collapse
|
45
|
Cazares TA, Rizvi FW, Iyer B, Chen X, Kotliar M, Bejjani AT, Wayman JA, Donmez O, Wronowski B, Parameswaran S, Kottyan LC, Barski A, Weirauch MT, Prasath VBS, Miraldi ER. maxATAC: Genome-scale transcription-factor binding prediction from ATAC-seq with deep neural networks. PLoS Comput Biol 2023; 19:e1010863. [PMID: 36719906 PMCID: PMC9917285 DOI: 10.1371/journal.pcbi.1010863] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 02/10/2023] [Accepted: 01/10/2023] [Indexed: 02/01/2023] Open
Abstract
Transcription factors read the genome, fundamentally connecting DNA sequence to gene expression across diverse cell types. Determining how, where, and when TFs bind chromatin will advance our understanding of gene regulatory networks and cellular behavior. The 2017 ENCODE-DREAM in vivo Transcription-Factor Binding Site (TFBS) Prediction Challenge highlighted the value of chromatin accessibility data to TFBS prediction, establishing state-of-the-art methods for TFBS prediction from DNase-seq. However, the more recent Assay-for-Transposase-Accessible-Chromatin (ATAC)-seq has surpassed DNase-seq as the most widely-used chromatin accessibility profiling method. Furthermore, ATAC-seq is the only such technique available at single-cell resolution from standard commercial platforms. While ATAC-seq datasets grow exponentially, suboptimal motif scanning is unfortunately the most common method for TFBS prediction from ATAC-seq. To enable community access to state-of-the-art TFBS prediction from ATAC-seq, we (1) curated an extensive benchmark dataset (127 TFs) for ATAC-seq model training and (2) built "maxATAC", a suite of user-friendly, deep neural network models for genome-wide TFBS prediction from ATAC-seq in any cell type. With models available for 127 human TFs, maxATAC is the largest collection of high-performance TFBS prediction models for ATAC-seq. maxATAC performance extends to primary cells and single-cell ATAC-seq, enabling improved TFBS prediction in vivo. We demonstrate maxATAC's capabilities by identifying TFBS associated with allele-dependent chromatin accessibility at atopic dermatitis genetic risk loci.
Collapse
Affiliation(s)
- Tareian A. Cazares
- Immunology Graduate Program, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| | - Faiz W. Rizvi
- Systems Biology and Physiology Graduate Program, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| | - Balaji Iyer
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, Ohio, United States of America
| | - Xiaoting Chen
- The Center for Autoimmune Genetics and Etiology (CAGE), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Michael Kotliar
- Division of Allergy and Immunology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Anthony T. Bejjani
- Molecular and Developmental Biology Graduate Program, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| | - Joseph A. Wayman
- Division of Immunobiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Omer Donmez
- The Center for Autoimmune Genetics and Etiology (CAGE), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Benjamin Wronowski
- Division of Allergy and Immunology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Sreeja Parameswaran
- The Center for Autoimmune Genetics and Etiology (CAGE), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Leah C. Kottyan
- The Center for Autoimmune Genetics and Etiology (CAGE), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
- Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Artem Barski
- Division of Allergy and Immunology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
- Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Matthew T. Weirauch
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- The Center for Autoimmune Genetics and Etiology (CAGE), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
- Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - V. B. Surya Prasath
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| | - Emily R. Miraldi
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, Ohio, United States of America
- Division of Immunobiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| |
Collapse
|
46
|
Toneyan S, Tang Z, Koo PK. Evaluating deep learning for predicting epigenomic profiles. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00570-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
47
|
Current challenges in understanding the role of enhancers in disease. Nat Struct Mol Biol 2022; 29:1148-1158. [PMID: 36482255 DOI: 10.1038/s41594-022-00896-3] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 11/04/2022] [Indexed: 12/13/2022]
Abstract
Enhancers play a central role in the spatiotemporal control of gene expression and tend to work in a cell-type-specific manner. In addition, they are suggested to be major contributors to phenotypic variation, evolution and disease. There is growing evidence that enhancer dysfunction due to genetic, structural or epigenetic mechanisms contributes to a broad range of human diseases referred to as enhanceropathies. Such mechanisms often underlie the susceptibility to common diseases, but can also play a direct causal role in cancer or Mendelian diseases. Despite the recent gain of insights into enhancer biology and function, we still have a limited ability to predict how enhancer dysfunction impacts gene expression. Here we discuss the major challenges that need to be overcome when studying the role of enhancers in disease etiology and highlight opportunities and directions for future studies, aiming to disentangle the molecular basis of enhanceropathies.
Collapse
|
48
|
Chambost AJ, Berabez N, Cochet-Escartin O, Ducray F, Gabut M, Isaac C, Martel S, Idbaih A, Rousseau D, Meyronet D, Monnier S. Machine learning-based detection of label-free cancer stem-like cell fate. Sci Rep 2022; 12:19066. [PMID: 36352045 PMCID: PMC9646748 DOI: 10.1038/s41598-022-21822-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Accepted: 10/04/2022] [Indexed: 11/11/2022] Open
Abstract
The detection of cancer stem-like cells (CSCs) is mainly based on molecular markers or functional tests giving a posteriori results. Therefore label-free and real-time detection of single CSCs remains a difficult challenge. The recent development of microfluidics has made it possible to perform high-throughput single cell imaging under controlled conditions and geometries. Such a throughput requires adapted image analysis pipelines while providing the necessary amount of data for the development of machine-learning algorithms. In this paper, we provide a data-driven study to assess the complexity of brightfield time-lapses to monitor the fate of isolated cancer stem-like cells in non-adherent conditions. We combined for the first time individual cell fate and cell state temporality analysis in a unique algorithm. We show that with our experimental system and on two different primary cell lines our optimized deep learning based algorithm outperforms classical computer vision and shallow learning-based algorithms in terms of accuracy while being faster than cutting-edge convolutional neural network (CNNs). With this study, we show that tailoring our deep learning-based algorithm to the image analysis problem yields better results than pre-trained models. As a result, such a rapid and accurate CNN is compatible with the rise of high-throughput data generation and opens the door to on-the-fly CSC fate analysis.
Collapse
Affiliation(s)
- Alexis J. Chambost
- grid.7849.20000 0001 2150 7757Cancer Initiation and Tumor Cell Identity Department, Cancer Research Centre of Lyon (CRCL) INSERM 1052, CNRS UMR5286, Centre Léon Bérard, Université Claude Bernard Lyon 1, 69008 Lyon, Villeurbanne, France ,grid.7849.20000 0001 2150 7757Univ Lyon, CNRS, Institut Lumière Matière, Univ Claude Bernard Lyon 1, 69622 Villeurbanne, France ,grid.413852.90000 0001 2163 3825Pathology Institute, Hospices Civils de Lyon, Lyon, France
| | - Nabila Berabez
- grid.7849.20000 0001 2150 7757Cancer Initiation and Tumor Cell Identity Department, Cancer Research Centre of Lyon (CRCL) INSERM 1052, CNRS UMR5286, Centre Léon Bérard, Université Claude Bernard Lyon 1, 69008 Lyon, Villeurbanne, France
| | - Olivier Cochet-Escartin
- grid.7849.20000 0001 2150 7757Univ Lyon, CNRS, Institut Lumière Matière, Univ Claude Bernard Lyon 1, 69622 Villeurbanne, France
| | - François Ducray
- grid.7849.20000 0001 2150 7757Cancer Initiation and Tumor Cell Identity Department, Cancer Research Centre of Lyon (CRCL) INSERM 1052, CNRS UMR5286, Centre Léon Bérard, Université Claude Bernard Lyon 1, 69008 Lyon, Villeurbanne, France ,grid.413852.90000 0001 2163 3825Neuro-oncology Department, Hospices Civils de Lyon, Lyon, France
| | - Mathieu Gabut
- grid.7849.20000 0001 2150 7757Cancer Initiation and Tumor Cell Identity Department, Cancer Research Centre of Lyon (CRCL) INSERM 1052, CNRS UMR5286, Centre Léon Bérard, Université Claude Bernard Lyon 1, 69008 Lyon, Villeurbanne, France
| | - Caroline Isaac
- grid.7849.20000 0001 2150 7757Cancer Initiation and Tumor Cell Identity Department, Cancer Research Centre of Lyon (CRCL) INSERM 1052, CNRS UMR5286, Centre Léon Bérard, Université Claude Bernard Lyon 1, 69008 Lyon, Villeurbanne, France
| | - Sylvie Martel
- grid.7849.20000 0001 2150 7757Cancer Initiation and Tumor Cell Identity Department, Cancer Research Centre of Lyon (CRCL) INSERM 1052, CNRS UMR5286, Centre Léon Bérard, Université Claude Bernard Lyon 1, 69008 Lyon, Villeurbanne, France
| | - Ahmed Idbaih
- grid.462844.80000 0001 2308 1657Institut du Cerveau - Paris Brain Institute - ICM, Inserm, CNRS, AP-HP, Hôpital Universitaire La Pitié Salpêtrière, DMU Neurosciences, Sorbonne Université, Paris, France
| | - David Rousseau
- grid.7252.20000 0001 2248 3363Laboratoire Angevin de Recherche en Ingénierie des Systèmes (LARIS), UMR Inrae IRHS, Université d’Angers, 49000 Angers, France
| | - David Meyronet
- grid.7849.20000 0001 2150 7757Cancer Initiation and Tumor Cell Identity Department, Cancer Research Centre of Lyon (CRCL) INSERM 1052, CNRS UMR5286, Centre Léon Bérard, Université Claude Bernard Lyon 1, 69008 Lyon, Villeurbanne, France ,grid.413852.90000 0001 2163 3825Pathology Institute, Hospices Civils de Lyon, Lyon, France
| | - Sylvain Monnier
- grid.7849.20000 0001 2150 7757Univ Lyon, CNRS, Institut Lumière Matière, Univ Claude Bernard Lyon 1, 69622 Villeurbanne, France
| |
Collapse
|
49
|
Majdandzic A, Rajesh C, Tang A, Toneyan S, Labelson E, Tripathy R, Koo PK. Selecting deep neural networks that yield consistent attribution-based interpretations for genomics. PROCEEDINGS OF MACHINE LEARNING RESEARCH 2022; 200:131-149. [PMID: 37205975 PMCID: PMC10194041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Deep neural networks (DNNs) have advanced our ability to take DNA primary sequence as input and predict a myriad of molecular activities measured via high-throughput functional genomic assays. Post hoc attribution analysis has been employed to provide insights into the importance of features learned by DNNs, often revealing patterns such as sequence motifs. However, attribution maps typically harbor spurious importance scores to an extent that varies from model to model, even for DNNs whose predictions generalize well. Thus, the standard approach for model selection, which relies on performance of a held-out validation set, does not guarantee that a high-performing DNN will provide reliable explanations. Here we introduce two approaches that quantify the consistency of important features across a population of attribution maps; consistency reflects a qualitative property of human interpretable attribution maps. We employ the consistency metrics as part of a multivariate model selection framework to identify models that yield high generalization performance and interpretable attribution analysis. We demonstrate the efficacy of this approach across various DNNs quantitatively with synthetic data and qualitatively with chromatin accessibility data.
Collapse
Affiliation(s)
| | - Chandana Rajesh
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory
| | - Amber Tang
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory
| | - Shushan Toneyan
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory
| | - Ethan Labelson
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory
| | - Rohit Tripathy
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory
| | - Peter K Koo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory
| |
Collapse
|
50
|
Li J, Wang J, Zhang P, Wang R, Mei Y, Sun Z, Fei L, Jiang M, Ma L, E W, Chen H, Wang X, Fu Y, Wu H, Liu D, Wang X, Li J, Guo Q, Liao Y, Yu C, Jia D, Wu J, He S, Liu H, Ma J, Lei K, Chen J, Han X, Guo G. Deep learning of cross-species single-cell landscapes identifies conserved regulatory programs underlying cell types. Nat Genet 2022; 54:1711-1720. [PMID: 36229673 DOI: 10.1038/s41588-022-01197-7] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2021] [Accepted: 08/31/2022] [Indexed: 11/09/2022]
Abstract
Despite extensive efforts to generate and analyze reference genomes, genetic models to predict gene regulation and cell fate decisions are lacking for most species. Here, we generated whole-body single-cell transcriptomic landscapes of zebrafish, Drosophila and earthworm. We then integrated cell landscapes from eight representative metazoan species to study gene regulation across evolution. Using these uniformly constructed cross-species landscapes, we developed a deep-learning-based strategy, Nvwa, to predict gene expression and identify regulatory sequences at the single-cell level. We systematically compared cell-type-specific transcription factors to reveal conserved genetic regulation in vertebrates and invertebrates. Our work provides a valuable resource and offers a new strategy for studying regulatory grammar in diverse biological systems.
Collapse
Affiliation(s)
- Jiaqi Li
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.,Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, China
| | - Jingjing Wang
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China. .,Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, China.
| | - Peijing Zhang
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.,Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, China
| | - Renying Wang
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yuqing Mei
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Zhongyi Sun
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Lijiang Fei
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Mengmeng Jiang
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.,Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, China
| | - Lifeng Ma
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Weigao E
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Haide Chen
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.,Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, China
| | - Xinru Wang
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yuting Fu
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Hanyu Wu
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Daiyuan Liu
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Xueyi Wang
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Jingyu Li
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Qile Guo
- Zhejiang University-University of Edinburgh Institute, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, China
| | - Yuan Liao
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.,Zhejiang Provincial Key Laboratory for Tissue Engineering and Regenerative Medicine, Dr. Li Dak Sum & Yip Yio Chin Center for Stem Cell and Regenerative Medicine, Hangzhou, China
| | - Chengxuan Yu
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Danmei Jia
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Jian Wu
- Division of Hepatobiliary and Pancreatic Surgery, Department of Surgery, First Affiliated Hospital School of Medicine, Zhejiang University, Hangzhou, China
| | - Shibo He
- College of Control Science and Engineering, Zhejiang University, Hangzhou, China
| | - Huanju Liu
- Women's Hospital and Institute of Genetics, Zhenjiang University School of Medicine, Hangzhou, China
| | - Jun Ma
- Women's Hospital and Institute of Genetics, Zhenjiang University School of Medicine, Hangzhou, China
| | - Kai Lei
- Westlake Laboratory of Life Sciences and Biomedicine, Key Laboratory of Growth Regulation and Translational Research of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, China
| | - Jiming Chen
- College of Control Science and Engineering, Zhejiang University, Hangzhou, China
| | - Xiaoping Han
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China. .,Zhejiang Provincial Key Laboratory for Tissue Engineering and Regenerative Medicine, Dr. Li Dak Sum & Yip Yio Chin Center for Stem Cell and Regenerative Medicine, Hangzhou, China.
| | - Guoji Guo
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China. .,Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, China. .,Zhejiang University-University of Edinburgh Institute, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, China. .,Zhejiang Provincial Key Laboratory for Tissue Engineering and Regenerative Medicine, Dr. Li Dak Sum & Yip Yio Chin Center for Stem Cell and Regenerative Medicine, Hangzhou, China. .,Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, China.
| |
Collapse
|