1
|
Arulsamy K, Xia B, Yu Y, Chen H, Pu WT, Zhang L, Chen K. SCIG: Machine learning uncovers cell identity genes in single cells by genetic sequence codes. Nucleic Acids Res 2025; 53:gkaf431. [PMID: 40433981 PMCID: PMC12117433 DOI: 10.1093/nar/gkaf431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Revised: 04/09/2025] [Accepted: 05/09/2025] [Indexed: 05/29/2025] Open
Abstract
Deciphering cell identity genes is pivotal to understanding cell differentiation, development, and cell identity dysregulation involving diseases. Here, we introduce SCIG, a machine-learning method to uncover cell identity genes in single cells. In alignment with recent reports that cell identity genes (CIGs) are regulated with unique epigenetic signatures, we found CIGs exhibit distinctive genetic sequence signatures, e.g. unique enrichment patterns of cis-regulatory elements. Using these genetic sequence signatures, along with gene expression information from single-cell RNA-seq data, SCIG uncovers the identity genes of a cell without a need for comparison to other cells. CIG score defined by SCIG surpassed expression value in network analysis to reveal the master transcription factors (TFs) regulating cell identity. Applying SCIG to the human endothelial cell atlas revealed that the tissue microenvironment is a critical supplement to master TFs for cell identity refinement. SCIG is publicly available at https://doi.org/10.5281/zenodo.14726426 , offering a valuable tool for advancing cell differentiation, development, and regenerative medicine research.
Collapse
Affiliation(s)
- Kulandaisamy Arulsamy
- Basic and Translational Research Division, Department of Cardiology, Boston Children’s Hospital, Boston, MA 02115, United States
- Department of Pediatrics, Harvard Medical School, Boston, MA 02115, United States
| | - Bo Xia
- Independent Researcher, Clemson, United States
| | - Yang Yu
- Basic and Translational Research Division, Department of Cardiology, Boston Children’s Hospital, Boston, MA 02115, United States
- Department of Pediatrics, Harvard Medical School, Boston, MA 02115, United States
| | - Hong Chen
- Vascular Biology Program, Boston Children’s Hospital and Harvard Medical School, Boston, MA 02115, United States
| | - William T Pu
- Basic and Translational Research Division, Department of Cardiology, Boston Children’s Hospital, Boston, MA 02115, United States
- Department of Pediatrics, Harvard Medical School, Boston, MA 02115, United States
| | - Lili Zhang
- Basic and Translational Research Division, Department of Cardiology, Boston Children’s Hospital, Boston, MA 02115, United States
- Department of Pediatrics, Harvard Medical School, Boston, MA 02115, United States
| | - Kaifu Chen
- Basic and Translational Research Division, Department of Cardiology, Boston Children’s Hospital, Boston, MA 02115, United States
- Department of Pediatrics, Harvard Medical School, Boston, MA 02115, United States
| |
Collapse
|
2
|
Wan C, Qu Y, Ye Z, Zhang T, Ma H, Chen M, Hou W, Ji Z. Comparative analysis of gene regulation in single cells using Compass. CELL REPORTS METHODS 2025; 5:101035. [PMID: 40345198 DOI: 10.1016/j.crmeth.2025.101035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Revised: 02/14/2025] [Accepted: 04/14/2025] [Indexed: 05/11/2025]
Abstract
Single-cell multi-omics is a transformative technology that measures both gene expression and chromatin accessibility in individual cells. However, most studies concentrate on a single tissue and are unable to determine whether a gene is regulated by a cis-regulatory element (CRE) in just one tissue or across multiple tissues. We developed Compass for comparative analysis of gene regulation across a large number of human and mouse tissues. Compass consists of a database, CompassDB, and an open-source R software package, CompassR. CompassDB contains processed single-cell multi-omics data of more than 2.8 million cells from hundreds of cell types. Building upon CompassDB, CompassR enables visualization and comparison of gene regulation across multiple tissues. We demonstrated that CompassR can identify CRE-gene linkages specific to a tissue type and their associated transcription factors in real examples.
Collapse
Affiliation(s)
- Changxin Wan
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA; Program of Computational Biology and Bioinformatics, Duke University School of Medicine, Durham, NC, USA
| | - Yilong Qu
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA
| | - Zhiyou Ye
- Department of Biomedical Engineering, Pratt School of Engineering, Duke University, Durham, NC, USA
| | - Tianbei Zhang
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA
| | - Huifang Ma
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA
| | - Ming Chen
- Department of Pathology, Duke University School of Medicine, Durham, NC, USA; Duke Cancer Institute, Duke University, Durham, NC, USA
| | - Wenpin Hou
- Department of Biostatistics, Columbia University Mailman School of Public Health, New York City, NY, USA
| | - Zhicheng Ji
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA; Program of Computational Biology and Bioinformatics, Duke University School of Medicine, Durham, NC, USA.
| |
Collapse
|
3
|
Qiu X, Zhu DY, Lu Y, Yao J, Jing Z, Min KH, Cheng M, Pan H, Zuo L, King S, Fang Q, Zheng H, Wang M, Wang S, Zhang Q, Yu S, Liao S, Liu C, Wu X, Lai Y, Hao S, Zhang Z, Wu L, Zhang Y, Li M, Tu Z, Lin J, Yang Z, Li Y, Gu Y, Ellison D, Chen A, Liu L, Weissman JS, Ma J, Xu X, Liu S, Bai Y. Spatiotemporal modeling of molecular holograms. Cell 2024; 187:7351-7373.e61. [PMID: 39532097 DOI: 10.1016/j.cell.2024.10.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 05/29/2024] [Accepted: 10/08/2024] [Indexed: 11/16/2024]
Abstract
Quantifying spatiotemporal dynamics during embryogenesis is crucial for understanding congenital diseases. We developed Spateo (https://github.com/aristoteleo/spateo-release), a 3D spatiotemporal modeling framework, and applied it to a 3D mouse embryogenesis atlas at E9.5 and E11.5, capturing eight million cells. Spateo enables scalable, partial, non-rigid alignment, multi-slice refinement, and mesh correction to create molecular holograms of whole embryos. It introduces digitization methods to uncover multi-level biology from subcellular to whole organ, identifying expression gradients along orthogonal axes of emergent 3D structures, e.g., secondary organizers such as midbrain-hindbrain boundary (MHB). Spateo further jointly models intercellular and intracellular interaction to dissect signaling landscapes in 3D structures, including the zona limitans intrathalamica (ZLI). Lastly, Spateo introduces "morphometric vector fields" of cell migration and integrates spatial differential geometry to unveil molecular programs underlying asymmetrical murine heart organogenesis and others, bridging macroscopic changes with molecular dynamics. Thus, Spateo enables the study of organ ecology at a molecular level in 3D space over time.
Collapse
Affiliation(s)
- Xiaojie Qiu
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA; Basic Sciences and Engineering Initiative, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Stanford, CA, USA; Department of Computer Science, Stanford University, Stanford, CA 94305, USA; Stanford Cardiovascular Institute, Stanford University, Stanford, CA, USA.
| | - Daniel Y Zhu
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Yifan Lu
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA; Basic Sciences and Engineering Initiative, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Stanford, CA, USA; Department of Computer Science, Stanford University, Stanford, CA 94305, USA; Electronic Information School, Wuhan University, Wuhan 430072, China
| | - Jiajun Yao
- BGI Research, Hangzhou 310030, China; BGI Research, Sanya 572025, China; College of Life Sciences, Northwest University, Xi'an 710069, China
| | - Zehua Jing
- BGI Research, Hangzhou 310030, China; BGI Research, Sanya 572025, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Kyung Hoi Min
- Ginkgo Bioworks, The Innovation and Design Building, Boston, MA 02210, USA
| | - Mengnan Cheng
- BGI Research, Hangzhou 310030, China; BGI Research, Shenzhen 518083, China
| | | | - Lulu Zuo
- BGI Research, Shenzhen 518083, China
| | - Samuel King
- Department of Bioengineering, Stanford University School of Medicine, Stanford, CA, USA
| | - Qi Fang
- BGI Research, Hangzhou 310030, China; BGI Research, Shenzhen 518083, China
| | - Huiwen Zheng
- BGI Research, Hangzhou 310030, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Mingyue Wang
- BGI Research, Hangzhou 310030, China; Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Shuai Wang
- BGI Research, Hangzhou 310030, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qingquan Zhang
- Department of Medicine, Division of Cardiology, University of California, San Diego, La Jolla, CA, USA
| | - Sichao Yu
- Whitehead Institute for Biomedical Research, Cambridge, MA, USA
| | - Sha Liao
- BGI Research, Shenzhen 518083, China; STOmics Tech Co., Ltd, Shenzhen 518083, China; BGI Research, Chongqing 401329, China
| | - Chao Liu
- BGI Research, Wuhan 430074, China
| | - Xinchao Wu
- BGI Research, Hangzhou 310030, China; BGI Research, Sanya 572025, China; School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Yiwei Lai
- BGI Research, Shenzhen 518083, China
| | | | - Zhewei Zhang
- BGI Research, Hangzhou 310030, China; BGI Research, Sanya 572025, China; School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Liang Wu
- BGI Research, Chongqing 401329, China
| | | | - Mei Li
- STOmics Tech Co., Ltd, Shenzhen 518083, China
| | - Zhencheng Tu
- BGI Research, Hangzhou 310030, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jinpei Lin
- BGI Research, Hangzhou 310030, China; BGI Research, Sanya 572025, China
| | - Zhuoxuan Yang
- BGI Research, Hangzhou 310030, China; School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | | | - Ying Gu
- BGI Research, Hangzhou 310030, China; BGI Research, Shenzhen 518083, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | | | - Ao Chen
- BGI Research, Shenzhen 518083, China; STOmics Tech Co., Ltd, Shenzhen 518083, China; BGI Research, Chongqing 401329, China
| | - Longqi Liu
- BGI Research, Hangzhou 310030, China; Shenzhen Bay Laboratory, Shenzhen 518132, China; Shenzhen Key Laboratory of Single-Cell Omics, BGI-Shenzhen, Shenzhen 518120, China
| | - Jonathan S Weissman
- Whitehead Institute for Biomedical Research, Cambridge, MA, USA; Department of Biology and Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA, USA; Koch Institute for Integrative Cancer Research at MIT, MIT, Cambridge, MA, USA
| | - Jiayi Ma
- Electronic Information School, Wuhan University, Wuhan 430072, China.
| | - Xun Xu
- BGI Research, Hangzhou 310030, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; Guangdong Provincial Key Laboratory of Genome Read and Write, BGI-Shenzhen, Shenzhen 518120, China.
| | - Shiping Liu
- BGI Research, Hangzhou 310030, China; Shenzhen Bay Laboratory, Shenzhen 518132, China; Shenzhen Key Laboratory of Single-Cell Omics, BGI-Shenzhen, Shenzhen 518120, China; The Guangdong-Hong Kong Joint Laboratory on Immunological and Genetic Kidney Diseases, Guangzhou, Guangdong, China.
| | - Yinqi Bai
- BGI Research, Sanya 572025, China; Hainan Technology Innovation Center for Marine Biological Resources Utilization (Preparatory Period), BGI Research, Sanya 572025, China.
| |
Collapse
|
5
|
George RM, Firulli BA, Podicheti R, Rusch DB, Mannion BJ, Pennacchio LA, Osterwalder M, Firulli AB. Single cell evaluation of endocardial Hand2 gene regulatory networks reveals HAND2-dependent pathways that impact cardiac morphogenesis. Development 2023; 150:dev201341. [PMID: 36620995 PMCID: PMC10110492 DOI: 10.1242/dev.201341] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 12/26/2022] [Indexed: 01/10/2023]
Abstract
The transcription factor HAND2 plays essential roles during cardiogenesis. Hand2 endocardial deletion (H2CKO) results in tricuspid atresia or double inlet left ventricle with accompanying intraventricular septum defects, hypo-trabeculated ventricles and an increased density of coronary lumens. To understand the regulatory mechanisms of these phenotypes, single cell transcriptome analysis of mouse E11.5 H2CKO hearts was performed revealing a number of disrupted endocardial regulatory pathways. Using HAND2 DNA occupancy data, we identify several HAND2-dependent enhancers, including two endothelial enhancers for the shear-stress master regulator KLF2. A 1.8 kb enhancer located 50 kb upstream of the Klf2 TSS imparts specific endothelial/endocardial expression within the vasculature and endocardium. This enhancer is HAND2-dependent for ventricular endocardium expression but HAND2-independent for Klf2 vascular and valve expression. Deletion of this Klf2 enhancer results in reduced Klf2 expression within ventricular endocardium. These data reveal that HAND2 functions within endocardial gene regulatory networks including shear-stress response.
Collapse
Affiliation(s)
- Rajani M. George
- Herman B Wells Center for Pediatric Research, Departments of Pediatrics, Anatomy and Medical and Molecular Genetics, Indiana Medical School, Indianapolis, IN 46202, USA
| | - Beth A. Firulli
- Herman B Wells Center for Pediatric Research, Departments of Pediatrics, Anatomy and Medical and Molecular Genetics, Indiana Medical School, Indianapolis, IN 46202, USA
| | - Ram Podicheti
- Center for Genomics and Bioinformatics, Indiana University, Bloomington, IN 47405, USA
| | - Douglas B. Rusch
- Center for Genomics and Bioinformatics, Indiana University, Bloomington, IN 47405, USA
| | - Brandon J. Mannion
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- Comparative Biochemistry Program, University of California, Berkeley, CA 94720, USA
| | - Len A. Pennacchio
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- Comparative Biochemistry Program, University of California, Berkeley, CA 94720, USA
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Marco Osterwalder
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- Department for BioMedical Research (DBMR), University of Bern, Bern 3008, Switzerland
- Department of Cardiology, Bern University Hospital, Bern 3010, Switzerland
| | - Anthony B. Firulli
- Herman B Wells Center for Pediatric Research, Departments of Pediatrics, Anatomy and Medical and Molecular Genetics, Indiana Medical School, Indianapolis, IN 46202, USA
| |
Collapse
|