1
|
Liarou M, Matthes T, Marchand-Maillet S. TimeFlow: A Density-Driven Pseudotime Method for Flow Cytometry Data Analysis. Cytometry A 2025; 107:233-247. [PMID: 40111028 DOI: 10.1002/cyto.a.24928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2024] [Revised: 01/15/2025] [Accepted: 02/27/2025] [Indexed: 03/22/2025]
Abstract
Pseudotime methods order cells undergoing differentiation from the least to the most differentiated. We developed TimeFlow, a new method for computing pseudotime in multi-dimensional flow cytometry datasets. TimeFlow tracks the differentiation path of each cell on a graph by following smooth changes in the cell population density. To compute the probability density function of the cells, it uses a normalizing flow model. We profiled bone marrow samples from three healthy patients using a 20-color antibody panel for flow cytometry and prepared datasets that ranged from 5,000 to 600,000 cells and included monocytes, neutrophils, erythrocytes, and B-cells at various maturation stages. TimeFlow computed fine-grained pseudotime for all the datasets, and the cell orderings were consistent with prior knowledge of human hematopoiesis. Experiments showed its potential in generalizing across patients and unseen cell states. We compared our method to 11 other pseudotime methods using in-house and public datasets and found very good performance for both linear and branching trajectories. TimeFlow's pseudotemporal orderings are useful for modeling the dynamics of cell surface proteins along linear trajectories. The biologically meaningful results in branching trajectories suggest the possibility of future applications with automated cell lineage detection. Code is available at https://github.com/MargaritaLiarou1/TimeFlow and data at https://osf.io/ykue7/.
Collapse
Affiliation(s)
- Margarita Liarou
- Department of Computer Science, Viper Group, University of Geneva, Carouge, Switzerland
| | - Thomas Matthes
- Hematology Service, Oncology Department, University Hospital Geneva, Geneva, Switzerland
- Clinical Pathology Service, Diagnostics Department, University Hospital Geneva, Geneva, Switzerland
| | - Stéphane Marchand-Maillet
- Department of Computer Science, Viper Group, University of Geneva, Carouge, Switzerland
- Centre Universitaire d'Informatique, University of Geneva, Carouge, Switzerland
| |
Collapse
|
2
|
He R, Sarwal V, Qiu X, Zhuang Y, Zhang L, Liu Y, Chiang J. Generative AI Models in Time-Varying Biomedical Data: Scoping Review. J Med Internet Res 2025; 27:e59792. [PMID: 40063929 PMCID: PMC11933772 DOI: 10.2196/59792] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 08/08/2024] [Accepted: 11/15/2024] [Indexed: 03/28/2025] Open
Abstract
BACKGROUND Trajectory modeling is a long-standing challenge in the application of computational methods to health care. In the age of big data, traditional statistical and machine learning methods do not achieve satisfactory results as they often fail to capture the complex underlying distributions of multimodal health data and long-term dependencies throughout medical histories. Recent advances in generative artificial intelligence (AI) have provided powerful tools to represent complex distributions and patterns with minimal underlying assumptions, with major impact in fields such as finance and environmental sciences, prompting researchers to apply these methods for disease modeling in health care. OBJECTIVE While AI methods have proven powerful, their application in clinical practice remains limited due to their highly complex nature. The proliferation of AI algorithms also poses a significant challenge for nondevelopers to track and incorporate these advances into clinical research and application. In this paper, we introduce basic concepts in generative AI and discuss current algorithms and how they can be applied to health care for practitioners with little background in computer science. METHODS We surveyed peer-reviewed papers on generative AI models with specific applications to time-series health data. Our search included single- and multimodal generative AI models that operated over structured and unstructured data, physiological waveforms, medical imaging, and multi-omics data. We introduce current generative AI methods, review their applications, and discuss their limitations and future directions in each data modality. RESULTS We followed the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines and reviewed 155 articles on generative AI applications to time-series health care data across modalities. Furthermore, we offer a systematic framework for clinicians to easily identify suitable AI methods for their data and task at hand. CONCLUSIONS We reviewed and critiqued existing applications of generative AI to time-series health data with the aim of bridging the gap between computational methods and clinical application. We also identified the shortcomings of existing approaches and highlighted recent advances in generative AI that represent promising directions for health care modeling.
Collapse
Affiliation(s)
- Rosemary He
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, United States
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, United States
| | - Varuni Sarwal
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, United States
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, United States
| | - Xinru Qiu
- Division of Biomedical Sciences, School of Medicine, University of California Riverside, Riverside, CA, United States
| | - Yongwen Zhuang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States
| | - Le Zhang
- Institute for Integrative Genome Biology, University of California Riverside, Riverside, CA, United States
| | - Yue Liu
- Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, United States
| | - Jeffrey Chiang
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, United States
- Department of Neurosurgery, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, United States
| |
Collapse
|
3
|
Chen X, Ma Y, Shi Y, Zhang B, Wu H, Gao J. Fuzzy-Based Identification of Transition Cells to Infer Cell Trajectory for Single-Cell Transcriptomics. J Comput Biol 2025; 32:253-273. [PMID: 39670822 DOI: 10.1089/cmb.2023.0432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2024] Open
Abstract
With the continuous evolution of single-cell RNA sequencing technology, it has become feasible to reconstruct cell development processes using computational methods. Trajectory inference is a crucial downstream analytical task that provides valuable insights into understanding cell cycle and differentiation. During cell development, cells exhibit both stable and transition states, which makes it challenging to accurately identify these cells. To address this challenge, we propose a novel single-cell trajectory inference method using fuzzy clustering, named scFCTI. By introducing fuzzy clustering and quantifying cell uncertainty, scFCTI can identify transition cells within unstable cell states. Moreover, scFCTI can obtain refined cell classification by characterizing different cell stages, which gain more accurate single-cell trajectory reconstruction containing transition paths. To validate the effectiveness of scFCTI, we conduct experiments on five real datasets and four different structure simulation datasets, comparing them with several state-of-the-art trajectory inference methods. The results demonstrate that scFCTI outperforms these methods by successfully identifying unstable cell clusters and obtaining more accurate cell paths with transition states. Especially the experimental results demonstrate that scFCTI can reconstruct the cell trajectory more precisely.
Collapse
Affiliation(s)
- Xiang Chen
- School of Science, Jiangnan University, Wuxi, China
| | - Yibing Ma
- School of Science, Jiangnan University, Wuxi, China
| | - Yongle Shi
- School of Science, Jiangnan University, Wuxi, China
| | - Bai Zhang
- School of Science, Jiangnan University, Wuxi, China
| | - Hanwen Wu
- School of Science, Jiangnan University, Wuxi, China
| | - Jie Gao
- School of Science, Jiangnan University, Wuxi, China
| |
Collapse
|
4
|
Zhang Z, Zhu Y, Lai Z, Zhou M, Chen X, Tang R, Alaynick W, Cho SH, Lo YH. Predicting cell properties with AI from 3D imaging flow cytometer data. Sci Rep 2025; 15:5715. [PMID: 39962067 PMCID: PMC11833109 DOI: 10.1038/s41598-024-80722-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2024] [Accepted: 11/21/2024] [Indexed: 02/20/2025] Open
Abstract
Predicting the properties of tissues or organisms from the genomics data is widely accepted by the medical community. Here we ask a question: can we predict the properties of each individual cell? Single-cell genomics does not work because the RNA sequencing process destroys the cell, not allowing us to verify our predictions. To test the hypothesis, we investigate the approach of using AI to analyze single-cell images obtained from a 3D imaging flow cytometer. We analyze the cell image at day zero and make the AI-assisted cell property prediction. The prediction is then examined later when the cells continue to live and develop. Our preliminary results are promising, showing 88% accuracy in predicting cells that will have a high protein expression level. The technique can have strong ramifications and impact on preventive medicine, drug development, cell therapy, and fundamental biomedical research.
Collapse
Affiliation(s)
- Zunming Zhang
- Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Yuxuan Zhu
- Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Zhaoyu Lai
- Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Minhong Zhou
- Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Xinyu Chen
- Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Rui Tang
- NanoCellect Biomedical Inc., San Diego, CA, 92121, USA
| | | | - Sung Hwan Cho
- NanoCellect Biomedical Inc., San Diego, CA, 92121, USA
| | - Yu-Hwa Lo
- Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
5
|
Sun F, Li H, Sun D, Fu S, Gu L, Shao X, Wang Q, Dong X, Duan B, Xing F, Wu J, Xiao M, Zhao F, Han JDJ, Liu Q, Fan X, Li C, Wang C, Shi T. Single-cell omics: experimental workflow, data analyses and applications. SCIENCE CHINA. LIFE SCIENCES 2025; 68:5-102. [PMID: 39060615 DOI: 10.1007/s11427-023-2561-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 04/18/2024] [Indexed: 07/28/2024]
Abstract
Cells are the fundamental units of biological systems and exhibit unique development trajectories and molecular features. Our exploration of how the genomes orchestrate the formation and maintenance of each cell, and control the cellular phenotypes of various organismsis, is both captivating and intricate. Since the inception of the first single-cell RNA technology, technologies related to single-cell sequencing have experienced rapid advancements in recent years. These technologies have expanded horizontally to include single-cell genome, epigenome, proteome, and metabolome, while vertically, they have progressed to integrate multiple omics data and incorporate additional information such as spatial scRNA-seq and CRISPR screening. Single-cell omics represent a groundbreaking advancement in the biomedical field, offering profound insights into the understanding of complex diseases, including cancers. Here, we comprehensively summarize recent advances in single-cell omics technologies, with a specific focus on the methodology section. This overview aims to guide researchers in selecting appropriate methods for single-cell sequencing and related data analysis.
Collapse
Affiliation(s)
- Fengying Sun
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China
| | - Haoyan Li
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Dongqing Sun
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Shaliu Fu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Lei Gu
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Shao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China
| | - Qinqin Wang
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Bin Duan
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Feiyang Xing
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Jun Wu
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Minmin Xiao
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
| | - Fangqing Zhao
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
| | - Qi Liu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China.
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China.
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China.
- Zhejiang Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310006, China.
| | - Chen Li
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| | - Chenfei Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China.
| | - Tieliu Shi
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China.
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
6
|
Lee S, Lee DY, So I, Chun JN, Jeon JH. Chromatin accessibility is associated with therapeutic response in prostate cancer. Oncol Lett 2024; 28:605. [PMID: 39483964 PMCID: PMC11525612 DOI: 10.3892/ol.2024.14738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 09/12/2024] [Indexed: 11/03/2024] Open
Abstract
Treatment of advanced prostate cancer is challenging due to a lack of effective therapies. Therefore, it is important to understand the molecular mechanisms underlying therapeutic resistance in prostate cancer and to identify promising drug targets offering significant clinical advantages. Given the pivotal role of dysregulated transcriptional programs in the therapeutic response, it is essential to prioritize translational efforts targeting cancer-associated transcription factors (TFs). The present study investigated whether chromatin accessibility was associated with therapeutic resistance in prostate cancer using Assay for Transposase-Accessible Chromatin with sequencing (ATAC-seq) data. The bioinformatics analysis identified differences in chromatin accessibility between the drug response (Remission) and drug resistance (Disease) groups. Additionally, a significant association was observed between chromatin accessibility, transcriptional output and TF activity. Among TFs, forkhead box protein M1 (FOXM1) was identified as a TF with high activity and expression in the Disease group. Notably, the results of the computational analysis were validated by FOXM1 knockdown experiments, which resulted in suppressed cell proliferation and enhanced therapeutic sensitivity in prostate cancer cells. The present findings demonstrated that chromatin accessibility and TF activity may be associated with therapeutic resistance in prostate cancer. Additionally, these results provide the basis for future investigations aimed at understanding the molecular mechanisms of drug resistance and developing novel therapeutic approaches for prostate cancer.
Collapse
Affiliation(s)
- Sanghoon Lee
- Department of Physiology and Biomedical Sciences, Seoul National University College of Medicine, Seoul 03080, Republic of Korea
| | - Da Young Lee
- Department of Physiology and Biomedical Sciences, Seoul National University College of Medicine, Seoul 03080, Republic of Korea
| | - Insuk So
- Department of Physiology and Biomedical Sciences, Seoul National University College of Medicine, Seoul 03080, Republic of Korea
- Institute of Human-Environment Interface Biology, Seoul National University, Seoul 03080, Republic of Korea
| | - Jung Nyeo Chun
- Department of Physiology and Biomedical Sciences, Seoul National University College of Medicine, Seoul 03080, Republic of Korea
- Institute of Human-Environment Interface Biology, Seoul National University, Seoul 03080, Republic of Korea
| | - Ju-Hong Jeon
- Department of Physiology and Biomedical Sciences, Seoul National University College of Medicine, Seoul 03080, Republic of Korea
- Institute of Human-Environment Interface Biology, Seoul National University, Seoul 03080, Republic of Korea
| |
Collapse
|
7
|
Sun R, Cao W, Li S, Jiang J, Shi Y, Zhang B. scGRN-Entropy: Inferring cell differentiation trajectories using single-cell data and gene regulation network-based transfer entropy. PLoS Comput Biol 2024; 20:e1012638. [PMID: 39585902 PMCID: PMC11627384 DOI: 10.1371/journal.pcbi.1012638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 12/09/2024] [Accepted: 11/12/2024] [Indexed: 11/27/2024] Open
Abstract
Research on cell differentiation facilitates a deeper understanding of the fundamental processes of life, elucidates the intrinsic mechanisms underlying diseases such as cancer, and advances the development of therapeutics and precision medicine. Existing methods for inferring cell differentiation trajectories from single-cell RNA sequencing (scRNA-seq) data primarily rely on static gene expression data to measure distances between cells and subsequently infer pseudotime trajectories. In this work, we introduce a novel method, scGRN-Entropy, for inferring cell differentiation trajectories and pseudotime from scRNA-seq data. Unlike existing approaches, scGRN-Entropy improves inference accuracy by incorporating dynamic changes in gene regulatory networks (GRN). In scGRN-Entropy, an undirected graph representing state transitions between cells is constructed by integrating both static relationships in gene expression space and dynamic relationships in the GRN space. The edges of the undirected graph are then refined using pseudotime inferred based on cell entropy in the GRN space. Finally, the Minimum Spanning Tree (MST) algorithm is applied to derive the cell differentiation trajectory. We validate the accuracy of scGRN-Entropy on eight different real scRNA-seq datasets, demonstrating its superior performance in inferring cell differentiation trajectories through comparative analysis with existing state-of-the-art methods.
Collapse
Affiliation(s)
- Rui Sun
- School of Mathematical & Physical Sciences, Wuhan Textile University, Wuhan, Hubei, China
- Center for Applied Mathematics and Interdisciplinary Studies, Wuhan Textile University, Wuhan, Hubei, China
| | - Wenjie Cao
- School of Mathematics, Sun Yat-sen University, Guangzhou, Guangdong, China
| | - ShengXuan Li
- School of Mathematical & Physical Sciences, Wuhan Textile University, Wuhan, Hubei, China
- Center for Applied Mathematics and Interdisciplinary Studies, Wuhan Textile University, Wuhan, Hubei, China
| | - Jian Jiang
- School of Mathematical & Physical Sciences, Wuhan Textile University, Wuhan, Hubei, China
- Center for Applied Mathematics and Interdisciplinary Studies, Wuhan Textile University, Wuhan, Hubei, China
| | - Yazhou Shi
- School of Mathematical & Physical Sciences, Wuhan Textile University, Wuhan, Hubei, China
- Center for Applied Mathematics and Interdisciplinary Studies, Wuhan Textile University, Wuhan, Hubei, China
| | - Bengong Zhang
- School of Mathematical & Physical Sciences, Wuhan Textile University, Wuhan, Hubei, China
- Center for Applied Mathematics and Interdisciplinary Studies, Wuhan Textile University, Wuhan, Hubei, China
| |
Collapse
|
8
|
Iida K, Okada M. Identifying Key Regulatory Genes in Drug Resistance Acquisition: Modeling Pseudotime Trajectories of Breast Cancer Single-Cell Transcriptome. Cancers (Basel) 2024; 16:1884. [PMID: 38791962 PMCID: PMC11119661 DOI: 10.3390/cancers16101884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2024] [Revised: 05/11/2024] [Accepted: 05/15/2024] [Indexed: 05/26/2024] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) technology has provided significant insights into cancer drug resistance at the single-cell level. However, understanding dynamic cell transitions at the molecular systems level remains limited, requiring a systems biology approach. We present an approach that combines mathematical modeling with a pseudotime analysis using time-series scRNA-seq data obtained from the breast cancer cell line MCF-7 treated with tamoxifen. Our single-cell analysis identified five distinct subpopulations, including tamoxifen-sensitive and -resistant groups. Using a single-gene mathematical model, we discovered approximately 560-680 genes out of 6000 exhibiting multistable expression states in each subpopulation, including key estrogen-receptor-positive breast cancer cell survival genes, such as RPS6KB1. A bifurcation analysis elucidated their regulatory mechanisms, and we mapped these genes into a molecular network associated with cell survival and metastasis-related pathways. Our modeling approach comprehensively identifies key regulatory genes for drug resistance acquisition, enhancing our understanding of potential drug targets in breast cancer.
Collapse
Affiliation(s)
- Keita Iida
- Institute for Protein Research, Osaka University, Suita 565-0871, Osaka, Japan;
| | | |
Collapse
|
9
|
Cui Z, Wei H, Goding C, Cui R. Stem cell heterogeneity, plasticity, and regulation. Life Sci 2023; 334:122240. [PMID: 37925141 DOI: 10.1016/j.lfs.2023.122240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 10/30/2023] [Accepted: 10/31/2023] [Indexed: 11/06/2023]
Abstract
As a population of homogeneous cells with both self-renewal and differentiation potential, stem cell pools are highly compartmentalized and contain distinct subsets that exhibit stable but limited heterogeneity during homeostasis. However, their striking plasticity is showcased under natural or artificial stress, such as injury, transplantation, cancer, and aging, leading to changes in their phenotype, constitution, metabolism, and function. The complex and diverse network of cell-extrinsic niches and signaling pathways, together with cell-intrinsic genetic and epigenetic regulators, tightly regulate both the heterogeneity during homeostasis and the plasticity under perturbation. Manipulating these factors offers better control of stem cell behavior and a potential revolution in the current state of regenerative medicine. However, disruptions of normal regulation by genetic mutation or excessive plasticity acquisition may contribute to the formation of tumors. By harnessing innovative techniques that enhance our understanding of stem cell heterogeneity and employing novel approaches to maximize the utilization of stem cell plasticity, stem cell therapy holds immense promise for revolutionizing the future of medicine.
Collapse
Affiliation(s)
- Ziyang Cui
- Department of Dermatology and Venerology, Peking University First Hospital, Beijing 100034, China.
| | - Hope Wei
- Department of Biology, Boston University, 5 Cummington Mall, Boston, MA 02215, United States of America
| | - Colin Goding
- Ludwig Institute for Cancer Research, Nuffield Department of Clinical Medicine, University of Oxford, Headington, Oxford OX37DQ, UK
| | - Rutao Cui
- Skin Disease Research Institute, The 2nd Hospital, Zhejiang University School of Medicine, Hangzhou 310058, China
| |
Collapse
|
10
|
Zou X, Liu Y, Wang M, Zou J, Shi Y, Su X, Xu J, Tong HHY, Ji Y, Gui L, Hao J. scCURE identifies cell types responding to immunotherapy and enables outcome prediction. CELL REPORTS METHODS 2023; 3:100643. [PMID: 37989083 PMCID: PMC10694528 DOI: 10.1016/j.crmeth.2023.100643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 07/17/2023] [Accepted: 10/23/2023] [Indexed: 11/23/2023]
Abstract
A deep understanding of immunotherapy response/resistance mechanisms and a highly reliable therapy response prediction are vital for cancer treatment. Here, we developed scCURE (single-cell RNA sequencing [scRNA-seq] data-based Changed and Unchanged cell Recognition during immunotherapy). Based on Gaussian mixture modeling, Kullback-Leibler (KL) divergence, and mutual nearest-neighbors criteria, scCURE can faithfully discriminate between cells affected or unaffected by immunotherapy intervention. By conducting scCURE analyses in melanoma and breast cancer immunotherapy scRNA-seq data, we found that the baseline profiles of specific CD8+ T and macrophage cells (identified by scCURE) can determine the way in which tumor microenvironment immune cells respond to immunotherapy, e.g., antitumor immunity activation or de-activation; therefore, these cells could be predictive factors for treatment response. In this work, we demonstrated that the immunotherapy-associated cell-cell heterogeneities revealed by scCURE can be utilized to integrate the therapy response mechanism study and prediction model construction.
Collapse
Affiliation(s)
- Xin Zou
- Center for Tumor Diagnosis & Therapy, Jinshan Hospital, Fudan University, Shanghai 201508, China; Department of Pathology, Jinshan Hospital, Fudan University, Shanghai 201508, China.
| | - Yujun Liu
- Department of Radiation Oncology, Fudan University Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Miaochen Wang
- Department of Oral and Maxillofacial-Head & Neck Oncology, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine; College of Stomatology, Shanghai Jiao Tong University; National Center for Stomatology; National Clinical Research Center for Oral Diseases; Shanghai Key Laboratory of Stomatology, Shanghai, China
| | - Jiawei Zou
- Institute of Clinical Science, Zhongshan Hospital, Fudan University, Shanghai 200032, China
| | - Yi Shi
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University, 1954 Huashan Road, Shanghai 200030, China
| | - Xianbin Su
- Key Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Center for Systems Biomedicine, Shanghai JiaoTong University, Shanghai, China
| | - Juan Xu
- Department of Stomatology, Sijing Hospital, Shanghai 201601, China
| | - Henry H Y Tong
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR, China
| | - Yuan Ji
- Molecular Pathology Center, Department Pathology, Zhongshan Hospital, Fudan University, Shanghai, China
| | - Lv Gui
- Department of Pathology, Jinshan Hospital, Fudan University, Shanghai 201508, China.
| | - Jie Hao
- Institute of Clinical Science, Zhongshan Hospital, Fudan University, Shanghai 200032, China.
| |
Collapse
|
11
|
Daniels BC, Wang Y, Page RE, Amdam GV. Identifying a developmental transition in honey bees using gene expression data. PLoS Comput Biol 2023; 19:e1010704. [PMID: 37733808 PMCID: PMC10547183 DOI: 10.1371/journal.pcbi.1010704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 10/03/2023] [Accepted: 09/05/2023] [Indexed: 09/23/2023] Open
Abstract
In many organisms, interactions among genes lead to multiple functional states, and changes to interactions can lead to transitions into new states. These transitions can be related to bifurcations (or critical points) in dynamical systems theory. Characterizing these collective transitions is a major challenge for systems biology. Here, we develop a statistical method for identifying bistability near a continuous transition directly from high-dimensional gene expression data. We apply the method to data from honey bees, where a known developmental transition occurs between bees performing tasks in the nest and leaving the nest to forage. Our method, which makes use of the expected shape of the distribution of gene expression levels near a transition, successfully identifies the emergence of bistability and links it to genes that are known to be involved in the behavioral transition. This proof of concept demonstrates that going beyond correlative analysis to infer the shape of gene expression distributions might be used more generally to identify collective transitions from gene expression data.
Collapse
Affiliation(s)
- Bryan C. Daniels
- School of Complex Adaptive Systems, Arizona State University, Tempe, Arizona, United States of America
| | - Ying Wang
- Banner Health Corporation, Phoenix, Arizona, United States of America
| | - Robert E. Page
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
- Department of Entomology and Nematology, University of California Davis, Davis, California, United States of America
| | - Gro V. Amdam
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
- Department of Ecology and Natural Resource Management, Norwegian University of Life Sciences, Aas, Norway
| |
Collapse
|
12
|
Yang T, Hathcock D, Chen Y, McEuen PL, Sethna JP, Cohen I, Griniasty I. Bifurcation instructed design of multistate machines. Proc Natl Acad Sci U S A 2023; 120:e2300081120. [PMID: 37579174 PMCID: PMC10450659 DOI: 10.1073/pnas.2300081120] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 07/23/2023] [Indexed: 08/16/2023] Open
Abstract
We propose a design paradigm for multistate machines where transitions from one state to another are organized by bifurcations of multiple equilibria of the energy landscape describing the collective interactions of the machine components. This design paradigm is attractive since, near bifurcations, small variations in a few control parameters can result in large changes to the system's state providing an emergent lever mechanism. Further, the topological configuration of transitions between states near such bifurcations ensures robust operation, making the machine less sensitive to fabrication errors and noise. To design such machines, we develop and implement a new efficient algorithm that searches for interactions between the machine components that give rise to energy landscapes with these bifurcation structures. We demonstrate a proof of concept for this approach by designing magnetoelastic machines whose motions are primarily guided by their magnetic energy landscapes and show that by operating near bifurcations we can achieve multiple transition pathways between states. This proof of concept demonstration illustrates the power of this approach, which could be especially useful for soft robotics and at the microscale where typical macroscale designs are difficult to implement.
Collapse
Affiliation(s)
- Teaya Yang
- Laboratory of Atomic and Solid State Physics, Cornell University, Ithaca, NY14853
| | - David Hathcock
- Laboratory of Atomic and Solid State Physics, Cornell University, Ithaca, NY14853
| | - Yuchao Chen
- Laboratory of Atomic and Solid State Physics, Cornell University, Ithaca, NY14853
| | - Paul L. McEuen
- Laboratory of Atomic and Solid State Physics, Cornell University, Ithaca, NY14853
- Kavli Institute at Cornell for Nanoscale Science, Cornell University, Ithaca, NY14853
| | - James P. Sethna
- Laboratory of Atomic and Solid State Physics, Cornell University, Ithaca, NY14853
| | - Itai Cohen
- Laboratory of Atomic and Solid State Physics, Cornell University, Ithaca, NY14853
- Kavli Institute at Cornell for Nanoscale Science, Cornell University, Ithaca, NY14853
| | - Itay Griniasty
- Laboratory of Atomic and Solid State Physics, Cornell University, Ithaca, NY14853
| |
Collapse
|
13
|
Proverbio D, Skupin A, Gonçalves J. Systematic analysis and optimization of early warning signals for critical transitions using distribution data. iScience 2023; 26:107156. [PMID: 37456849 PMCID: PMC10338236 DOI: 10.1016/j.isci.2023.107156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 04/21/2023] [Accepted: 06/12/2023] [Indexed: 07/18/2023] Open
Abstract
Abrupt shifts between alternative regimes occur in complex systems, from cell regulation to brain functions to ecosystems. Several model-free early warning signals (EWS) have been proposed to detect impending transitions, but failure or poor performance in some systems have called for better investigation of their generic applicability. Notably, there are still ongoing debates whether such signals can be successfully extracted from data in particular from biological experiments. In this work, we systematically investigate properties and performance of dynamical EWS in different deteriorating conditions, and we propose an optimized combination to trigger warnings as early as possible, eventually verified on experimental data from microbiological populations. Our results explain discrepancies observed in the literature between warning signs extracted from simulated models and from real data, provide guidance for EWS selection based on desired systems and suggest an optimized composite indicator to alert for impending critical transitions using distribution data.
Collapse
Affiliation(s)
- Daniele Proverbio
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 Avenue Du Swing, 4367 Belvaux, Luxembourg
- College of Engineering, Mathematics and Physical Sciences, University of Exeter, Exeter EX4 4QL, UK
| | - Alexander Skupin
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 Avenue Du Swing, 4367 Belvaux, Luxembourg
- National Center for Microscopy and Imaging Research, University of California San Diego, Gilman Drive, La Jolla, CA 9500, USA
- Department of Physics and Material Science, University of Luxembourg, 162a Avenue de La Faiencerie, 1511 Luxembourg, Luxembourg
| | - Jorge Gonçalves
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 Avenue Du Swing, 4367 Belvaux, Luxembourg
- Department of Plant Sciences, University of Cambridge, Cambridge CB2 3EA, UK
| |
Collapse
|
14
|
Multi-Objective Genetic Algorithm for Cluster Analysis of Single-Cell Transcriptomes. J Pers Med 2023; 13:jpm13020183. [PMID: 36836417 PMCID: PMC9960600 DOI: 10.3390/jpm13020183] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 01/15/2023] [Accepted: 01/16/2023] [Indexed: 01/22/2023] Open
Abstract
Cells are the basic building blocks of human organisms, and the identification of their types and states in transcriptomic data is an important and challenging task. Many of the existing approaches to cell-type prediction are based on clustering methods that optimize only one criterion. In this paper, a multi-objective Genetic Algorithm for cluster analysis is proposed, implemented, and systematically validated on 48 experimental and 60 synthetic datasets. The results demonstrate that the performance and the accuracy of the proposed algorithm are reproducible, stable, and better than those of single-objective clustering methods. Computational run times of multi-objective clustering of large datasets were studied and used in supervised machine learning to accurately predict the execution times of clustering of new single-cell transcriptomes.
Collapse
|
15
|
Ao C, Jiao S, Wang Y, Yu L, Zou Q. Biological Sequence Classification: A Review on Data and General Methods. RESEARCH (WASHINGTON, D.C.) 2022; 2022:0011. [PMID: 39285948 PMCID: PMC11404319 DOI: 10.34133/research.0011] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 10/25/2022] [Indexed: 09/19/2024]
Abstract
With the rapid development of biotechnology, the number of biological sequences has grown exponentially. The continuous expansion of biological sequence data promotes the application of machine learning in biological sequences to construct predictive models for mining biological sequence information. There are many branches of biological sequence classification research. In this review, we mainly focus on the function and modification classification of biological sequences based on machine learning. Sequence-based prediction and analysis are the basic tasks to understand the biological functions of DNA, RNA, proteins, and peptides. However, there are hundreds of classification models developed for biological sequences, and the quite varied specific methods seem dizzying at first glance. Here, we aim to establish a long-term support website (http://lab.malab.cn/~acy/BioseqData/home.html), which provides readers with detailed information on the classification method and download links to relevant datasets. We briefly introduce the steps to build an effective model framework for biological sequence data. In addition, a brief introduction to single-cell sequencing data analysis methods and applications in biology is also included. Finally, we discuss the current challenges and future perspectives of biological sequence classification research.
Collapse
Affiliation(s)
- Chunyan Ao
- School of Computer Science and Technology, Xidian University, Xi'an, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Shihu Jiao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Yansu Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
16
|
Giri R, Brady S, Papadopoulos DK, Carthew RW. Single-cell Senseless protein analysis reveals metastable states during the transition to a sensory organ fate. iScience 2022; 25:105097. [PMID: 36157584 PMCID: PMC9494244 DOI: 10.1016/j.isci.2022.105097] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Revised: 08/02/2022] [Accepted: 09/05/2022] [Indexed: 11/29/2022] Open
Abstract
Cell fate decisions can be envisioned as bifurcating dynamical systems, and the decision that Drosophila cells make during sensory organ differentiation has been described as such. We extended these studies by focusing on the Senseless protein which orchestrates sensory cell fate transitions. Wing cells contain intermediate Senseless numbers before their fate transition, after which they express much greater numbers of Senseless molecules as they differentiate. However, the dynamics are inconsistent with it being a simple bistable system. Cells with intermediate Senseless are best modeled as residing in four discrete states, each with a distinct protein number and occupying a specific region of the tissue. Although the states are stable over time, the number of molecules in each state vary with time. The fold change in molecule number between adjacent states is invariant and robust to absolute protein number variation. Thus, cells transitioning to sensory fates exhibit metastability with relativistic properties.
Collapse
Affiliation(s)
- Ritika Giri
- Department of Molecular Biosciences, Northwestern University, Evanston, IL 60208, USA,NSF-Simons Center for Quantitative Biology, Northwestern University, Evanston, IL 60208, USA
| | - Shannon Brady
- Department of Molecular Biosciences, Northwestern University, Evanston, IL 60208, USA
| | - Dimitrios K. Papadopoulos
- Center for Molecular Medicine (CMM), Department of Clinical Neuroscience, Karolinska Institute, 17176 Stockholm, Sweden,Department of Biology, University of Crete, Voutes University Campus, Heraklion, Crete 70013, Greece
| | - Richard W. Carthew
- Department of Molecular Biosciences, Northwestern University, Evanston, IL 60208, USA,NSF-Simons Center for Quantitative Biology, Northwestern University, Evanston, IL 60208, USA,Corresponding author
| |
Collapse
|
17
|
Sáez M, Briscoe J, Rand DA. Dynamical landscapes of cell fate decisions. Interface Focus 2022; 12:20220002. [PMID: 35860004 PMCID: PMC9184965 DOI: 10.1098/rsfs.2022.0002] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Accepted: 04/25/2022] [Indexed: 12/11/2022] Open
Abstract
The generation of cellular diversity during development involves differentiating cells transitioning between discrete cell states. In the 1940s, the developmental biologist Conrad Waddington introduced a landscape metaphor to describe this process. The developmental path of a cell was pictured as a ball rolling through a terrain of branching valleys with cell fate decisions represented by the branch points at which the ball decides between one of two available valleys. Here we discuss progress in constructing quantitative dynamical models inspired by this view of cellular differentiation. We describe a framework based on catastrophe theory and dynamical systems methods that provides the foundations for quantitative geometric models of cellular differentiation. These models can be fit to experimental data and used to make quantitative predictions about cellular differentiation. The theory indicates that cell fate decisions can be described by a small number of decision structures, such that there are only two distinct ways in which cells make a binary choice between one of two fates. We discuss the biological relevance of these mechanisms and suggest the approach is broadly applicable for the quantitative analysis of differentiation dynamics and for determining principles of developmental decisions.
Collapse
Affiliation(s)
- M. Sáez
- The Francis Crick Institute, 1 Midland Road, London NW1 1AT, UK
- IQS, Universitat Ramon Llull, Via Augusta 390, Barcelona 08017, Spain
| | - J. Briscoe
- The Francis Crick Institute, 1 Midland Road, London NW1 1AT, UK
| | - D. A. Rand
- Mathematics Institute, University of Warwick, Coventry CV4 7AL, UK
- Zeeman Institute for Systems Biology and Infectious Epidemiology Research, University of Warwick, Coventry CV4 7AL, UK
| |
Collapse
|
18
|
Cho H, Kuo YH, Rockne RC. Comparison of cell state models derived from single-cell RNA sequencing data: graph versus multi-dimensional space. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:8505-8536. [PMID: 35801475 PMCID: PMC9308174 DOI: 10.3934/mbe.2022395] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Single-cell sequencing technologies have revolutionized molecular and cellular biology and stimulated the development of computational tools to analyze the data generated from these technology platforms. However, despite the recent explosion of computational analysis tools, relatively few mathematical models have been developed to utilize these data. Here we compare and contrast two cell state geometries for building mathematical models of cell state-transitions with single-cell RNA-sequencing data with hematopoeisis as a model system; (i) by using partial differential equations on a graph representing intermediate cell states between known cell types, and (ii) by using the equations on a multi-dimensional continuous cell state-space. As an application of our approach, we demonstrate how the calibrated models may be used to mathematically perturb normal hematopoeisis to simulate, predict, and study the emergence of novel cell states during the pathogenesis of acute myeloid leukemia. We particularly focus on comparing the strength and weakness of the graph model and multi-dimensional model.
Collapse
Affiliation(s)
- Heyrim Cho
- Department of Mathematics, University of California Riverside, Riverside, CA, USA
- Interdisciplinary Center for Quantitative Modeling in Biology, University of California Riverside, Riverside, CA, USA
| | - Ya-Huei Kuo
- Department of Hematologic Malignancies Translational Science, City of Hope, Duarte, CA, USA
| | - Russell C. Rockne
- Department of Computational and Quantitative Medicine, Division of Mathematical Oncology, City of Hope, Duarte, CA, USA
- Interdisciplinary Center for Quantitative Modeling in Biology, University of California Riverside, Riverside, CA, USA
| |
Collapse
|
19
|
Zhu M, Lai Y. Improvements Achieved by Multiple Imputation for Single-Cell RNA-Seq Data in Clustering Analysis and Differential Expression Analysis. J Comput Biol 2022; 29:634-649. [PMID: 35575729 DOI: 10.1089/cmb.2021.0597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In a single-cell RNA-seq (scRNA-seq) data set, a high proportion of missing values (or an excessive number of zeroes) are frequently observed. For the related follow-up tasks, such as clustering analysis and differential expression analysis, a data set without missing values is generally required. Many imputation approaches have been proposed for this purpose. Multiple imputation (MI) is a well-established approach to address possible biases in a follow-up analysis result based on one-time imputed data. There is a lack of investigation on this in the analysis of scRNA-seq data. In this study, we have investigated how to efficiently apply the MI approach to the clustering analysis and the differential expression analysis of scRNA-seq data. We proposed an MI procedure for clustering analysis and an MI procedure for differential expression analysis. To demonstrate the improvements achieved by MI in clustering analysis and differential expression analysis of scRNA-seq data, we analyzed three well-known scRNA-seq data sets. scIGANs, an scRNA-seq imputation method based on the generative adversarial networks (GANs), has been recently proposed for scRNA-seq data imputation. Multiple randomly imputed data sets can be conveniently generated by this method. We implemented our MI procedures based on scIGANs. We demonstrated that MI yielded improved performances on the clustering analysis and differential expression analysis results. Our applications to experimental scRNA-seq data illustrated the advantages of MI over one-time imputation of missing values in scRNA-seq data.
Collapse
Affiliation(s)
- Mengqiu Zhu
- Department of Statistics, The George Washington University, Washington, District of Columbia, USA
| | - Yinglei Lai
- School of Mathematical Science, University of Science and Technology of China, Hefei, China
| |
Collapse
|
20
|
Dai C, Jiang Y, Yin C, Su R, Zeng X, Zou Q, Nakai K, Wei L. scIMC: a platform for benchmarking comparison and visualization analysis of scRNA-seq data imputation methods. Nucleic Acids Res 2022; 50:4877-4899. [PMID: 35524568 PMCID: PMC9122610 DOI: 10.1093/nar/gkac317] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 04/08/2022] [Accepted: 04/20/2022] [Indexed: 12/13/2022] Open
Abstract
With the advent of single-cell RNA sequencing (scRNA-seq), one major challenging is the so-called 'dropout' events that distort gene expression and remarkably influence downstream analysis in single-cell transcriptome. To address this issue, much effort has been done and several scRNA-seq imputation methods were developed with two categories: model-based and deep learning-based. However, comprehensively and systematically comparing existing methods are still lacking. In this work, we use six simulated and two real scRNA-seq datasets to comprehensively evaluate and compare a total of 12 available imputation methods from the following four aspects: (i) gene expression recovering, (ii) cell clustering, (iii) gene differential expression, and (iv) cellular trajectory reconstruction. We demonstrate that deep learning-based approaches generally exhibit better overall performance than model-based approaches under major benchmarking comparison, indicating the power of deep learning for imputation. Importantly, we built scIMC (single-cell Imputation Methods Comparison platform), the first online platform that integrates all available state-of-the-art imputation methods for benchmarking comparison and visualization analysis, which is expected to be a convenient and useful tool for researchers of interest. It is now freely accessible via https://server.wei-group.net/scIMC/.
Collapse
Affiliation(s)
- Chichi Dai
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Yi Jiang
- School of Software, Shandong University, Jinan, China.,Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Chenglin Yin
- School of Software, Shandong University, Jinan, China.,Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Ran Su
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, China
| | - Kenta Nakai
- Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China.,Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| |
Collapse
|
21
|
Single Cell Self-Paced Clustering with Transcriptome Sequencing Data. Int J Mol Sci 2022; 23:ijms23073900. [PMID: 35409258 PMCID: PMC8999118 DOI: 10.3390/ijms23073900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 03/28/2022] [Accepted: 03/29/2022] [Indexed: 11/17/2022] Open
Abstract
Single cell RNA sequencing (scRNA-seq) allows researchers to explore tissue heterogeneity, distinguish unusual cell identities, and find novel cellular subtypes by providing transcriptome profiling for individual cells. Clustering analysis is usually used to predict cell class assignments and infer cell identities. However, the performance of existing single-cell clustering methods is extremely sensitive to the presence of noise data and outliers. Existing clustering algorithms can easily fall into local optimal solutions. There is still no consensus on the best performing method. To address this issue, we introduce a single cell self-paced clustering (scSPaC) method with F-norm based nonnegative matrix factorization (NMF) for scRNA-seq data and a sparse single cell self-paced clustering (sscSPaC) method with l21-norm based nonnegative matrix factorization for scRNA-seq data. We gradually add single cells from simple to complex to our model until all cells are selected. In this way, the influences of noisy data and outliers can be significantly reduced. The proposed method achieved the best performance on both simulation data and real scRNA-seq data. A case study about human clara cells and ependymal cells scRNA-seq data clustering shows that scSPaC is more advantageous near the clustering dividing line.
Collapse
|
22
|
A design principle of spindle oscillations in mammalian sleep. iScience 2022; 25:103873. [PMID: 35243235 PMCID: PMC8861656 DOI: 10.1016/j.isci.2022.103873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 01/07/2022] [Accepted: 02/01/2022] [Indexed: 11/22/2022] Open
Abstract
Neural oscillations are mainly regulated by molecular mechanisms and network connectivity of neurons. Large-scale simulations of neuronal networks have driven the population-level understanding of neural oscillations. However, cell-intrinsic mechanisms, especially a design principle, of neural oscillations remain largely elusive. Herein, we developed a minimal, Hodgkin-Huxley-type model of groups of neurons to investigate molecular mechanisms underlying spindle oscillation, which is synchronized oscillatory activity predominantly observed during mammalian sleep. We discovered that slowly inactivating potassium channels played an essential role in characterizing the firing pattern. The detailed analysis of the minimal model revealed that leak sodium and potassium channels, which controlled passive properties of the fast variable (i.e., membrane potential), competitively regulated the base value and time constant of the slow variable (i.e., cytosolic calcium concentration). Consequently, we propose a theoretical design principle of spindle oscillations that may explain intracellular mechanisms behind the flexible control over oscillation density and calcium setpoint. A minimal, Hodgkin-Huxley-type model of spindle oscillations is developed The property of delayed rectifier K+ channels characterizes spindle oscillations The combination of bifurcations specifies spindle oscillations Spindle oscillations are controlled by the balance of inward and outward currents
Collapse
|
23
|
Anchang B, Mendez-Giraldez R, Xu X, Archer TK, Chen Q, Hu G, Plevritis SK, Motsinger-Reif AA, Li JL. Visualization, benchmarking and characterization of nested single-cell heterogeneity as dynamic forest mixtures. Brief Bioinform 2022; 23:6534382. [PMID: 35192692 PMCID: PMC8921621 DOI: 10.1093/bib/bbac017] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 11/19/2021] [Accepted: 01/13/2022] [Indexed: 11/13/2022] Open
Abstract
A major topic of debate in developmental biology centers on whether development is continuous, discontinuous, or a mixture of both. Pseudo-time trajectory models, optimal for visualizing cellular progression, model cell transitions as continuous state manifolds and do not explicitly model real-time, complex, heterogeneous systems and are challenging for benchmarking with temporal models. We present a data-driven framework that addresses these limitations with temporal single-cell data collected at discrete time points as inputs and a mixture of dependent minimum spanning trees (MSTs) as outputs, denoted as dynamic spanning forest mixtures (DSFMix). DSFMix uses decision-tree models to select genes that account for variations in multimodality, skewness and time. The genes are subsequently used to build the forest using tree agglomerative hierarchical clustering and dynamic branch cutting. We first motivate the use of forest-based algorithms compared to single-tree approaches for visualizing and characterizing developmental processes. We next benchmark DSFMix to pseudo-time and temporal approaches in terms of feature selection, time correlation, and network similarity. Finally, we demonstrate how DSFMix can be used to visualize, compare and characterize complex relationships during biological processes such as epithelial-mesenchymal transition, spermatogenesis, stem cell pluripotency, early transcriptional response from hormones and immune response to coronavirus disease. Our results indicate that the expression of genes during normal development exhibits a high proportion of non-uniformly distributed profiles that are mostly right-skewed and multimodal; the latter being a characteristic of major steady states during development. Our study also identifies and validates gene signatures driving complex dynamic processes during somatic or germline differentiation.
Collapse
Affiliation(s)
- Benedict Anchang
- Corresponding author: Benedict Anchang, Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences. 111 T W Alexander Dr, Research Triangle Park, NC 27709, USA and Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, USA. Tel +1 984-287-3350; E-mail:
| | - Raul Mendez-Giraldez
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Stanford, California, USA
| | - Xiaojiang Xu
- Integrative Bioinformatics Support Group, National Institute of Environmental Health Sciences, Stanford, California, USA
| | - Trevor K Archer
- Epigenetics & Stem Cell Biology Laboratory/Chromatin & Gene Expression Group, National Institute of Environmental Health Sciences, Stanford, California, USA
| | - Qing Chen
- Epigenetics & Stem Cell Biology Laboratory/Chromatin & Gene Expression Group, National Institute of Environmental Health Sciences, Stanford, California, USA
| | - Guang Hu
- Epigenetics & Stem Cell Biology Laboratory/Chromatin & Gene Expression Group, National Institute of Environmental Health Sciences, Stanford, California, USA
| | - Sylvia K Plevritis
- Department of Biomedical Data Science, Center for Cancer Systems Biology, Stanford University, Stanford, California, USA
| | - Alison Anne Motsinger-Reif
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Stanford, California, USA
| | - Jian-Liang Li
- Integrative Bioinformatics Support Group, National Institute of Environmental Health Sciences, Stanford, California, USA
| |
Collapse
|
24
|
Ding J, Sharon N, Bar-Joseph Z. Temporal modelling using single-cell transcriptomics. Nat Rev Genet 2022; 23:355-368. [PMID: 35102309 DOI: 10.1038/s41576-021-00444-7] [Citation(s) in RCA: 84] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/14/2021] [Indexed: 12/16/2022]
Abstract
Methods for profiling genes at the single-cell level have revolutionized our ability to study several biological processes and systems including development, differentiation, response programmes and disease progression. In many of these studies, cells are profiled over time in order to infer dynamic changes in cell states and types, sets of expressed genes, active pathways and key regulators. However, time-series single-cell RNA sequencing (scRNA-seq) also raises several new analysis and modelling issues. These issues range from determining when and how deep to profile cells, linking cells within and between time points, learning continuous trajectories, and integrating bulk and single-cell data for reconstructing models of dynamic networks. In this Review, we discuss several approaches for the analysis and modelling of time-series scRNA-seq, highlighting their steps, key assumptions, and the types of data and biological questions they are most appropriate for.
Collapse
|
25
|
Rams M, Conrad TOF. Dictionary learning allows model-free pseudotime estimation of transcriptomic data. BMC Genomics 2022; 23:56. [PMID: 35033004 PMCID: PMC8760643 DOI: 10.1186/s12864-021-08276-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 12/22/2021] [Indexed: 11/10/2022] Open
Abstract
Background Pseudotime estimation from dynamic single-cell transcriptomic data enables characterisation and understanding of the underlying processes, for example developmental processes. Various pseudotime estimation methods have been proposed during the last years. Typically, these methods start with a dimension reduction step because the low-dimensional representation is usually easier to analyse. Approaches such as PCA, ICA or t-SNE belong to the most widely used methods for dimension reduction in pseudotime estimation methods. However, these methods usually make assumptions on the derived dimensions, which can result in important dataset properties being missed. In this paper, we suggest a new dictionary learning based approach, dynDLT, for dimension reduction and pseudotime estimation of dynamic transcriptomic data. Dictionary learning is a matrix factorisation approach that does not restrict the dependence of the derived dimensions. To evaluate the performance, we conduct a large simulation study and analyse 8 real-world datasets. Results The simulation studies reveal that firstly, dynDLT preserves the simulated patterns in low-dimension and the pseudotimes can be derived from the low-dimensional representation. Secondly, the results show that dynDLT is suitable for the detection of genes exhibiting the simulated dynamic patterns, thereby facilitating the interpretation of the compressed representation and thus the dynamic processes. For the real-world data analysis, we select datasets with samples that are taken at different time points throughout an experiment. The pseudotimes found by dynDLT have high correlations with the experimental times. We compare the results to other approaches used in pseudotime estimation, or those that are method-wise closely connected to dictionary learning: ICA, NMF, PCA, t-SNE, and UMAP. DynDLT has the best overall performance for the simulated and real-world datasets. Conclusions We introduce dynDLT, a method that is suitable for pseudotime estimation. Its main advantages are: (1) It presents a model-free approach, meaning that it does not restrict the dependence of the derived dimensions; (2) Genes that are relevant in the detected dynamic processes can be identified from the dictionary matrix; (3) By a restriction of the dictionary entries to positive values, the dictionary atoms are highly interpretable. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-021-08276-9).
Collapse
Affiliation(s)
- Mona Rams
- Freie Universitaet Berlin, Arnimallee 6, Berlin, 14195, Germany.
| | - Tim O F Conrad
- Konrad-Zuse-Zentrum für Informationstechnik Berlin, Takustraße 7, Berlin, 14195, Germany
| |
Collapse
|
26
|
OUP accepted manuscript. Brief Funct Genomics 2022; 21:159-176. [DOI: 10.1093/bfgp/elac002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 01/20/2022] [Accepted: 01/25/2022] [Indexed: 11/14/2022] Open
|
27
|
Jeong H, Shin S, Yeom HG. Accurate Single-Cell Clustering through Ensemble Similarity Learning. Genes (Basel) 2021; 12:genes12111670. [PMID: 34828276 PMCID: PMC8623803 DOI: 10.3390/genes12111670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Revised: 10/10/2021] [Accepted: 10/20/2021] [Indexed: 11/16/2022] Open
Abstract
Single-cell sequencing provides novel means to interpret the transcriptomic profiles of individual cells. To obtain in-depth analysis of single-cell sequencing, it requires effective computational methods to accurately predict single-cell clusters because single-cell sequencing techniques only provide the transcriptomic profiles of each cell. Although an accurate estimation of the cell-to-cell similarity is an essential first step to derive reliable single-cell clustering results, it is challenging to obtain the accurate similarity measurement because it highly depends on a selection of genes for similarity evaluations and the optimal set of genes for the accurate similarity estimation is typically unknown. Moreover, due to technical limitations, single-cell sequencing includes a larger number of artificial zeros, and the technical noise makes it difficult to develop effective single-cell clustering algorithms. Here, we describe a novel single-cell clustering algorithm that can accurately predict single-cell clusters in large-scale single-cell sequencing by effectively reducing the zero-inflated noise and accurately estimating the cell-to-cell similarities. First, we construct an ensemble similarity network based on different similarity estimates, and reduce the artificial noise using a random walk with restart framework. Finally, starting from a larger number small size but highly consistent clusters, we iteratively merge a pair of clusters with the maximum similarities until it reaches the predicted number of clusters. Extensive performance evaluation shows that the proposed single-cell clustering algorithm can yield the accurate single-cell clustering results and it can help deciphering the key messages underlying complex biological mechanisms.
Collapse
Affiliation(s)
- Hyundoo Jeong
- Department of Mechatronics Engineering, Incheon National University, Incheon 22012, Korea;
| | - Sungtae Shin
- Department of Mechanical Engineering, Dong-A University, Busan 49315, Korea;
| | - Hong-Gi Yeom
- Department of Electronics Engineering, Chosun University, Gwangju 61452, Korea
- Correspondence:
| |
Collapse
|
28
|
Noise distorts the epigenetic landscape and shapes cell-fate decisions. Cell Syst 2021; 13:83-102.e6. [PMID: 34626539 DOI: 10.1016/j.cels.2021.09.002] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 06/21/2021] [Accepted: 09/02/2021] [Indexed: 12/24/2022]
Abstract
The Waddington epigenetic landscape has become an iconic representation of the cellular differentiation process. Recent single-cell transcriptomic data provide new opportunities for quantifying this originally conceptual tool, offering insight into the gene regulatory networks underlying cellular development. While many methods for constructing the landscape have been proposed, by far the most commonly employed approach is based on computing the landscape as the negative logarithm of the steady-state probability distribution. Here, we use simple models to highlight the complexities and limitations that arise when reconstructing the potential landscape in the presence of stochastic fluctuations. We consider how the landscape changes in accordance with different stochastic systems and show that it is the subtle interplay between the deterministic and stochastic components of the system that ultimately shapes the landscape. We further discuss how the presence of noise has important implications for the identifiability of the regulatory dynamics from experimental data. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
|
29
|
Wang X, Zheng J. Velo-Predictor: an ensemble learning pipeline for RNA velocity prediction. BMC Bioinformatics 2021; 22:419. [PMID: 34479487 PMCID: PMC8414693 DOI: 10.1186/s12859-021-04330-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 08/23/2021] [Indexed: 11/10/2022] Open
Abstract
Background RNA velocity is a novel and powerful concept which enables the inference of dynamical cell state changes from seemingly static single-cell RNA sequencing (scRNA-seq) data. However, accurate estimation of RNA velocity is still a challenging problem, and the underlying kinetic mechanisms of transcriptional and splicing regulations are not fully clear. Moreover, scRNA-seq data tend to be sparse compared with possible cell states, and a given dataset of estimated RNA velocities needs imputation for some cell states not yet covered. Results We formulate RNA velocity prediction as a supervised learning problem of classification for the first time, where a cell state space is divided into equal-sized segments by directions as classes, and the estimated RNA velocity vectors are considered as ground truth. We propose Velo-Predictor, an ensemble learning pipeline for predicting RNA velocities from scRNA-seq data. We test different models on two real datasets, Velo-Predictor exhibits good performance, especially when XGBoost was used as the base predictor. Parameter analysis and visualization also show that the method is robust and able to make biologically meaningful predictions. Conclusion The accurate result shows that Velo-Predictor can effectively simplify the procedure by learning a predictive model from gene expression data, which could help to construct a continous landscape and give biologists an intuitive picture about the trend of cellular dynamics.
Collapse
Affiliation(s)
- Xin Wang
- School of Information Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Pudong District, 201210, Shanghai, China
| | - Jie Zheng
- School of Information Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Pudong District, 201210, Shanghai, China.
| |
Collapse
|
30
|
Li H. Single-cell RNA sequencing in Drosophila: Technologies and applications. WILEY INTERDISCIPLINARY REVIEWS. DEVELOPMENTAL BIOLOGY 2021; 10:e396. [PMID: 32940008 PMCID: PMC7960577 DOI: 10.1002/wdev.396] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 08/09/2020] [Accepted: 08/20/2020] [Indexed: 12/12/2022]
Abstract
Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool for investigating cell states and functions at the single-cell level. It has greatly revolutionized transcriptomic studies in many life science research fields, such as neurobiology, immunology, and developmental biology. With the fast development of both experimental platforms and bioinformatics approaches over the past decade, scRNA-seq is becoming economically feasible and experimentally practical for many biomedical laboratories. Drosophila has served as an excellent model organism for dissecting cellular and molecular mechanisms that underlie tissue development, adult cell function, disease, and aging. The recent application of scRNA-seq methods to Drosophila tissues has led to a number of exciting discoveries. In this review, I will provide a summary of recent scRNA-seq studies in Drosophila, focusing on technical approaches and biological applications. I will also discuss current challenges and future opportunities of making new discoveries using scRNA-seq in Drosophila. This article is categorized under: Technologies > Analysis of the Transcriptome.
Collapse
Affiliation(s)
- Hongjie Li
- Department of Biology, Stanford University, Stanford, California, USA
| |
Collapse
|
31
|
Wei Z, Zhang S. CALLR: a semi-supervised cell-type annotation method for single-cell RNA sequencing data. Bioinformatics 2021; 37:i51-i58. [PMID: 34252936 PMCID: PMC8686678 DOI: 10.1093/bioinformatics/btab286] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/23/2021] [Indexed: 12/13/2022] Open
Abstract
Motivation Single-cell RNA sequencing (scRNA-seq) technology has been widely applied to capture the heterogeneity of different cell types within complex tissues. An essential step in scRNA-seq data analysis is the annotation of cell types. Traditional cell-type annotation is mainly clustering the cells first, and then using the aggregated cluster-level expression profiles and the marker genes to label each cluster. Such methods are greatly dependent on the clustering results, which are insufficient for accurate annotation. Results In this article, we propose a semi-supervised learning method for cell-type annotation called CALLR. It combines unsupervised learning represented by the graph Laplacian matrix constructed from all the cells and supervised learning using sparse logistic regression. By alternately updating the cell clusters and annotation labels, high annotation accuracy can be achieved. The model is formulated as an optimization problem, and a computationally efficient algorithm is developed to solve it. Experiments on 10 real datasets show that CALLR outperforms the compared (semi-)supervised learning methods, and the popular clustering methods. Availability and implementation The implementation of CALLR is available at https://github.com/MathSZhang/CALLR. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ziyang Wei
- Department of Statistics, University of Chicago, Chicago, IL 60637, USA.,School of Mathematical Sciences, Fudan University, Shanghai 200433, China
| | - Shuqin Zhang
- School of Mathematical Sciences, Fudan University, Shanghai 200433, China.,Laboratory of Mathematics for Nonlinear Science, Fudan University, Shanghai 200433, China.,Shanghai Key Laboratory for Contemporary Applied Mathematics, Fudan University, Shanghai 200433, China
| |
Collapse
|
32
|
Mondal PK, Saha US, Mukhopadhyay I. PseudoGA: cell pseudotime reconstruction based on genetic algorithm. Nucleic Acids Res 2021; 49:7909-7924. [PMID: 34244782 PMCID: PMC8661435 DOI: 10.1093/nar/gkab457] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Revised: 05/03/2021] [Accepted: 07/07/2021] [Indexed: 01/05/2023] Open
Abstract
Dynamic regulation of gene expression is often governed by progression through transient cell states. Bulk RNA-seq analysis can only detect average change in expression levels and is unable to identify this dynamics. Single cell RNA-seq presents an unprecedented opportunity that helps in placing the cells on a hypothetical time trajectory that reflects gradual transition of their transcriptomes. This continuum trajectory or ‘pseudotime’, may reveal the developmental pathway and provide us with information on dynamic transcriptomic changes and other biological processes. Existing approaches to build pseudotime heavily depend on reducing huge dimension to extremely low dimensional subspaces and may lead to loss of information. We propose PseudoGA, a genetic algorithm based approach to order cells assuming that gene expressions vary according to a smooth curve along the pseudotime trajectory. We observe superior accuracy of our method in simulated as well as benchmarking real datasets. Generality of the assumption behind PseudoGA and no dependence on dimensionality reduction technique make it a robust choice for pseudotime estimation from single cell transcriptome data. PseudoGA is also time efficient when applied to a large single cell RNA-seq data and adaptable to parallel computing. R code for PseudoGA is freely available at https://github.com/indranillab/pseudoga.
Collapse
Affiliation(s)
- Pronoy Kanti Mondal
- Human Genetics Unit, Indian Statistical Institute, 203 B. T. Road, Kolkata 700108, West Bengal, India
| | - Udit Surya Saha
- Human Genetics Unit, Indian Statistical Institute, 203 B. T. Road, Kolkata 700108, West Bengal, India
| | - Indranil Mukhopadhyay
- Human Genetics Unit, Indian Statistical Institute, 203 B. T. Road, Kolkata 700108, West Bengal, India
| |
Collapse
|
33
|
Zhao C, Xiu W, Hua Y, Zhang N, Zhang Y. CStreet: a computed Cell State trajectory inference method for time-series single-cell RNA sequencing data. Bioinformatics 2021; 37:3774-3780. [PMID: 34196686 DOI: 10.1093/bioinformatics/btab488] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 06/24/2021] [Accepted: 06/30/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The increasing amount of time-series single-cell RNA sequencing (scRNA-seq) data raises the key issue of connecting cell states (i.e., cell clusters or cell types) to obtain the continuous temporal dynamics of transcription, which can highlight the unified biological mechanisms involved in cell state transitions. However, most existing trajectory methods are specifically designed for individual cells, so they can hardly meet the needs of accurately inferring the trajectory topology of the cell state, which usually contains cells assigned to different branches. RESULTS Here, we present CStreet, a computed Cell State trajectory inference method for time-series scRNA-seq data. It uses time-series information to construct the k-nearest neighbors connections between cells within each time point and between adjacent time points. Then, CStreet estimates the connection probabilities of the cell states and visualizes the trajectory, which may include multiple starting points and paths, using a force-directed graph. By comparing the performance of CStreet with that of six commonly used cell state trajectory reconstruction methods on simulated data and real data, we demonstrate the high accuracy and high tolerance of CStreet. AVAILABILITY AND IMPLEMENTATION CStreet is written in Python and freely available on the web at https://github.com/TongjiZhanglab/CStreet and https://doi.org/10.5281/zenodo.4483205. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chengchen Zhao
- Institute for Regenerative Medicine, Shanghai East Hospital, Shanghai Key Laboratory of Signaling and Disease Research, Frontier Science Center for Stem Cell Research, School of Life Science and Technology, Tongji University, Shanghai, 200092, China
| | - Wenchao Xiu
- Institute for Regenerative Medicine, Shanghai East Hospital, Shanghai Key Laboratory of Signaling and Disease Research, Frontier Science Center for Stem Cell Research, School of Life Science and Technology, Tongji University, Shanghai, 200092, China
| | - Yuwei Hua
- Institute for Regenerative Medicine, Shanghai East Hospital, Shanghai Key Laboratory of Signaling and Disease Research, Frontier Science Center for Stem Cell Research, School of Life Science and Technology, Tongji University, Shanghai, 200092, China
| | - Naiqian Zhang
- School of Mathematics and Statistics, Shandong University at Weihai, Weihai, 264209, China
| | - Yong Zhang
- Institute for Regenerative Medicine, Shanghai East Hospital, Shanghai Key Laboratory of Signaling and Disease Research, Frontier Science Center for Stem Cell Research, School of Life Science and Technology, Tongji University, Shanghai, 200092, China
| |
Collapse
|
34
|
Bartlett T. Fusion of single-cell transcriptome and DNA-binding data, for genomic network inference in cortical development. BMC Bioinformatics 2021; 22:301. [PMID: 34088262 PMCID: PMC8176738 DOI: 10.1186/s12859-021-04201-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 05/12/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Network models are well-established as very useful computational-statistical tools in cell biology. However, a genomic network model based only on gene expression data can, by definition, only infer gene co-expression networks. Hence, in order to infer gene regulatory patterns, it is necessary to also include data related to binding of regulatory factors to DNA. RESULTS We propose a new dynamic genomic network model, for inferring patterns of genomic regulatory influence in dynamic processes such as development. Our model fuses experiment-specific gene expression data with publicly available DNA-binding data. The method we propose is computationally efficient, and can be applied to genome-wide data with tens of thousands of transcripts. Thus, our method is well suited for use as an exploratory tool for genome-wide data. We apply our method to data from human fetal cortical development, and our findings confirm genomic regulatory patterns which are recognised as being fundamental to neuronal development. CONCLUSIONS Our method provides a mathematical/computational toolbox which, when coupled with targeted experiments, will reveal and confirm important new functional genomic regulatory processes in mammalian development.
Collapse
Affiliation(s)
- Thomas Bartlett
- University College London, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
35
|
Camacho-Aguilar E, Warmflash A, Rand DA. Quantifying cell transitions in C. elegans with data-fitted landscape models. PLoS Comput Biol 2021; 17:e1009034. [PMID: 34061834 PMCID: PMC8195438 DOI: 10.1371/journal.pcbi.1009034] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2021] [Revised: 06/11/2021] [Accepted: 05/03/2021] [Indexed: 12/19/2022] Open
Abstract
Increasing interest has emerged in new mathematical approaches that simplify the study of complex differentiation processes by formalizing Waddington's landscape metaphor. However, a rational method to build these landscape models remains an open problem. Here we study vulval development in C. elegans by developing a framework based on Catastrophe Theory (CT) and approximate Bayesian computation (ABC) to build data-fitted landscape models. We first identify the candidate qualitative landscapes, and then use CT to build the simplest model consistent with the data, which we quantitatively fit using ABC. The resulting model suggests that the underlying mechanism is a quantifiable two-step decision controlled by EGF and Notch-Delta signals, where a non-vulval/vulval decision is followed by a bistable transition to the two vulval states. This new model fits a broad set of data and makes several novel predictions.
Collapse
Affiliation(s)
- Elena Camacho-Aguilar
- Mathematics Institute, University of Warwick, Coventry, United Kingdom
- Department of Biosciences, Rice University, Houston, Texas, United States of America
| | - Aryeh Warmflash
- Department of Biosciences, Rice University, Houston, Texas, United States of America
- Department of Bioengineering, Rice University, Houston, Texas, United States of America
| | - David A. Rand
- Mathematics Institute, University of Warwick, Coventry, United Kingdom
- Zeeman Institute for Systems Biology & Infectious Disease Epidemiology Research, University of Warwick, Coventry, United Kingdom
| |
Collapse
|
36
|
|
37
|
Dai Y, Xu A, Li J, Wu L, Yu S, Chen J, Zhao W, Sun XJ, Huang J. CytoTree: an R/Bioconductor package for analysis and visualization of flow and mass cytometry data. BMC Bioinformatics 2021; 22:138. [PMID: 33752602 PMCID: PMC7983272 DOI: 10.1186/s12859-021-04054-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 02/26/2021] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND The rapidly increasing dimensionality and throughput of flow and mass cytometry data necessitate new bioinformatics tools for analysis and interpretation, and the recently emerging single-cell-based algorithms provide a powerful strategy to meet this challenge. RESULTS Here, we present CytoTree, an R/Bioconductor package designed to analyze and interpret multidimensional flow and mass cytometry data. CytoTree provides multiple computational functionalities that integrate most of the commonly used techniques in unsupervised clustering and dimensionality reduction and, more importantly, support the construction of a tree-shaped trajectory based on the minimum spanning tree algorithm. A graph-based algorithm is also implemented to estimate the pseudotime and infer intermediate-state cells. We apply CytoTree to several examples of mass cytometry and time-course flow cytometry data on heterogeneity-based cytology and differentiation/reprogramming experiments to illustrate the practical utility achieved in a fast and convenient manner. CONCLUSIONS CytoTree represents a versatile tool for analyzing multidimensional flow and mass cytometry data and to producing heuristic results for trajectory construction and pseudotime estimation in an integrated workflow.
Collapse
Affiliation(s)
- Yuting Dai
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 197 Ruijin Er Road, Shanghai, 200025, China
| | - Aining Xu
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 197 Ruijin Er Road, Shanghai, 200025, China
| | - Jianfeng Li
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 197 Ruijin Er Road, Shanghai, 200025, China
| | - Liang Wu
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 197 Ruijin Er Road, Shanghai, 200025, China
| | - Shanhe Yu
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 197 Ruijin Er Road, Shanghai, 200025, China
| | - Jun Chen
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research and Center for Individualized Medicine, Mayo Clinic, 200 1st St SW, Rochester, MN, 55905, USA
| | - Weili Zhao
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 197 Ruijin Er Road, Shanghai, 200025, China.
| | - Xiao-Jian Sun
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 197 Ruijin Er Road, Shanghai, 200025, China.
| | - Jinyan Huang
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 197 Ruijin Er Road, Shanghai, 200025, China.
| |
Collapse
|
38
|
Kopf A, Claassen M. Latent representation learning in biology and translational medicine. PATTERNS (NEW YORK, N.Y.) 2021; 2:100198. [PMID: 33748792 PMCID: PMC7961186 DOI: 10.1016/j.patter.2021.100198] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Current data generation capabilities in the life sciences render scientists in an apparently contradicting situation. While it is possible to simultaneously measure an ever-increasing number of systems parameters, the resulting data are becoming increasingly difficult to interpret. Latent variable modeling allows for such interpretation by learning non-measurable hidden variables from observations. This review gives an overview over the different formal approaches to latent variable modeling, as well as applications at different scales of biological systems, such as molecular structures, intra- and intercellular regulatory up to physiological networks. The focus is on demonstrating how these approaches have enabled interpretable representations and ultimately insights in each of these domains. We anticipate that a wider dissemination of latent variable modeling in the life sciences will enable a more effective and productive interpretation of studies based on heterogeneous and high-dimensional data modalities.
Collapse
Affiliation(s)
- Andreas Kopf
- Institute of Molecular Systems Biology, ETH Zürich, 8093 Zürich, Switzerland
| | - Manfred Claassen
- Division of Clinical Bioinformatics, Department of Internal Medicine I, University Hospital Tübingen, 72076 Tübingen, Germany
- Computer Science Department, Eberhard Karls University of Tübingen, 72076 Tübingen, Germany
- Cluster of Excellence Machine Learning (EXC 2064), Eberhard Karls University of Tübingen, 72076 Tübingen, Germany
| |
Collapse
|
39
|
Revealing lineage-related signals in single-cell gene expression using random matrix theory. Proc Natl Acad Sci U S A 2021; 118:1913931118. [PMID: 33836557 PMCID: PMC7980374 DOI: 10.1073/pnas.1913931118] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Gene expression profiles of a cellular population, generated by single-cell RNA sequencing, contains rich information about biological state, including cell type, cell cycle phase, gene regulatory patterns, and location within the tissue of origin. A major challenge is to disentangle information about these different biological states from each other, including distinguishing from cell lineage, since the correlation of cellular expression patterns is necessarily contaminated by ancestry. Here, we use a recent advance in random matrix theory, discovered in the context of protein phylogeny, to identify differentiation or ancestry-related processes in single-cell data. Qin and Colwell [C. Qin, L. J. Colwell, Proc. Natl. Acad. Sci. U.S.A. 115, 690-695 (2018)] showed that ancestral relationships in protein sequences create a power-law signature in the covariance eigenvalue distribution. We demonstrate the existence of such signatures in scRNA-seq data and that the genes driving them are indeed related to differentiation and developmental pathways. We predict the existence of similar power-law signatures for cells along linear trajectories and demonstrate this for linearly differentiating systems. Furthermore, we generalize to show that the same signatures can arise for cells along tissue-specific spatial trajectories. We illustrate these principles in diverse tissues and organisms, including the mammalian epidermis and lung, Drosophila whole-embryo, adult Hydra, dendritic cells, the intestinal epithelium, and cells undergoing induced pluripotent stem cells (iPSC) reprogramming. We show how these results can be used to interpret the gradual dynamics of lineage structure along iPSC reprogramming. Together, we provide a framework that can be used to identify signatures of specific biological processes in single-cell data without prior knowledge and identify candidate genes associated with these processes.
Collapse
|
40
|
Pretschner A, Pabel S, Haas M, Heiner M, Marwan W. Regulatory Dynamics of Cell Differentiation Revealed by True Time Series From Multinucleate Single Cells. Front Genet 2021; 11:612256. [PMID: 33488676 PMCID: PMC7820898 DOI: 10.3389/fgene.2020.612256] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 12/07/2020] [Indexed: 12/31/2022] Open
Abstract
Dynamics of cell fate decisions are commonly investigated by inferring temporal sequences of gene expression states by assembling snapshots of individual cells where each cell is measured once. Ordering cells according to minimal differences in expression patterns and assuming that differentiation occurs by a sequence of irreversible steps, yields unidirectional, eventually branching Markov chains with a single source node. In an alternative approach, we used multi-nucleate cells to follow gene expression taking true time series. Assembling state machines, each made from single-cell trajectories, gives a network of highly structured Markov chains of states with different source and sink nodes including cycles, revealing essential information on the dynamics of regulatory events. We argue that the obtained networks depict aspects of the Waddington landscape of cell differentiation and characterize them as reachability graphs that provide the basis for the reconstruction of the underlying gene regulatory network.
Collapse
Affiliation(s)
- Anna Pretschner
- Magdeburg Centre for Systems Biology and Institute of Biology, Otto von Guericke University, Magdeburg, Germany
| | - Sophie Pabel
- Magdeburg Centre for Systems Biology and Institute of Biology, Otto von Guericke University, Magdeburg, Germany
| | - Markus Haas
- Magdeburg Centre for Systems Biology and Institute of Biology, Otto von Guericke University, Magdeburg, Germany
| | - Monika Heiner
- Computer Science Institute, Brandenburg University of Technology Cottbus-Senftenberg, Cottbus, Germany
| | - Wolfgang Marwan
- Magdeburg Centre for Systems Biology and Institute of Biology, Otto von Guericke University, Magdeburg, Germany
| |
Collapse
|
41
|
Lieberman B, Kusi M, Hung CN, Chou CW, He N, Ho YY, Taverna JA, Huang THM, Chen CL. Toward uncharted territory of cellular heterogeneity: advances and applications of single-cell RNA-seq. JOURNAL OF TRANSLATIONAL GENETICS AND GENOMICS 2021; 5:1-21. [PMID: 34322662 PMCID: PMC8315474 DOI: 10.20517/jtgg.2020.51] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Among single-cell analysis technologies, single-cell RNA-seq (scRNA-seq) has been one of the front runners in technical inventions. Since its induction, scRNA-seq has been well received and undergone many fast-paced technical improvements in cDNA synthesis and amplification, processing and alignment of next generation sequencing reads, differentially expressed gene calling, cell clustering, subpopulation identification, and developmental trajectory prediction. scRNA-seq has been exponentially applied to study global transcriptional profiles in all cell types in humans and animal models, healthy or with diseases, including cancer. Accumulative novel subtypes and rare subpopulations have been discovered as potential underlying mechanisms of stochasticity, differentiation, proliferation, tumorigenesis, and aging. scRNA-seq has gradually revealed the uncharted territory of cellular heterogeneity in transcriptomes and developed novel therapeutic approaches for biomedical applications. This review of the advancement of scRNA-seq methods provides an exploratory guide of the quickly evolving technical landscape and insights of focused features and strengths in each prominent area of progress.
Collapse
Affiliation(s)
- Brandon Lieberman
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Meena Kusi
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Chia-Nung Hung
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Chih-Wei Chou
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Ning He
- Department of Nursing, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Yen-Yi Ho
- Department of Statistics, University of South Carolina, Columbia, SC 29208, USA
| | - Josephine A. Taverna
- Department of Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
- Mays Cancer Center, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Tim H. M. Huang
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
- Mays Cancer Center, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Chun-Liang Chen
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
- Mays Cancer Center, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| |
Collapse
|
42
|
Goodwin K, Nelson CM. Uncovering cellular networks in branching morphogenesis using single-cell transcriptomics. Curr Top Dev Biol 2020; 143:239-280. [PMID: 33820623 DOI: 10.1016/bs.ctdb.2020.09.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Single-cell RNA-sequencing (scRNA-seq) and related technologies to identify cell types and measure gene expression in space, in time, and within lineages have multiplied rapidly in recent years. As these techniques proliferate, we are seeing an increase in their application to the study of developing tissues. Here, we focus on single-cell investigations of branching morphogenesis. Branched organs are highly complex but typically develop recursively, such that a given developmental stage theoretically contains the entire spectrum of cell identities from progenitor to terminally differentiated. Therefore, branched organs are a highly attractive system for study by scRNA-seq. First, we provide an update on advances in the field of scRNA-seq analysis, focusing on spatial transcriptomics, computational reconstruction of differentiation trajectories, and integration of scRNA-seq with lineage tracing. In addition, we discuss the possibilities and limitations for applying these techniques to studying branched organs. We then discuss exciting advances made using scRNA-seq in the study of branching morphogenesis and differentiation in mammalian organs, with emphasis on the lung, kidney, and mammary gland. We propose ways that scRNA-seq could be used to address outstanding questions in each organ. Finally, we highlight the importance of physical and mechanical signals in branching morphogenesis and speculate about how scRNA-seq and related techniques could be applied to study tissue morphogenesis beyond just differentiation.
Collapse
Affiliation(s)
- Katharine Goodwin
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, United States
| | - Celeste M Nelson
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ, United States; Department of Molecular Biology, Princeton University, Princeton, NJ, United States.
| |
Collapse
|
43
|
Chen X, Chen S, Jiang R. EnClaSC: a novel ensemble approach for accurate and robust cell-type classification of single-cell transcriptomes. BMC Bioinformatics 2020; 21:392. [PMID: 32938367 PMCID: PMC7496207 DOI: 10.1186/s12859-020-03679-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In recent years, the rapid development of single-cell RNA-sequencing (scRNA-seq) techniques enables the quantitative characterization of cell types at a single-cell resolution. With the explosive growth of the number of cells profiled in individual scRNA-seq experiments, there is a demand for novel computational methods for classifying newly-generated scRNA-seq data onto annotated labels. Although several methods have recently been proposed for the cell-type classification of single-cell transcriptomic data, such limitations as inadequate accuracy, inferior robustness, and low stability greatly limit their wide applications. RESULTS We propose a novel ensemble approach, named EnClaSC, for accurate and robust cell-type classification of single-cell transcriptomic data. Through comprehensive validation experiments, we demonstrate that EnClaSC can not only be applied to the self-projection within a specific dataset and the cell-type classification across different datasets, but also scale up well to various data dimensionality and different data sparsity. We further illustrate the ability of EnClaSC to effectively make cross-species classification, which may shed light on the studies in correlation of different species. EnClaSC is freely available at https://github.com/xy-chen16/EnClaSC . CONCLUSIONS EnClaSC enables highly accurate and robust cell-type classification of single-cell transcriptomic data via an ensemble learning method. We expect to see wide applications of our method to not only transcriptome studies, but also the classification of more general data.
Collapse
Affiliation(s)
- Xiaoyang Chen
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic and Systems Biology, Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Shengquan Chen
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic and Systems Biology, Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Rui Jiang
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic and Systems Biology, Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
44
|
Single-cell transcriptomic atlas of the human endometrium during the menstrual cycle. Nat Med 2020; 26:1644-1653. [PMID: 32929266 DOI: 10.1038/s41591-020-1040-z] [Citation(s) in RCA: 314] [Impact Index Per Article: 62.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 07/29/2020] [Indexed: 12/20/2022]
Abstract
In a human menstrual cycle the endometrium undergoes remodeling, shedding and regeneration, all of which are driven by substantial gene expression changes in the underlying cellular hierarchy. Despite its importance in human fertility and regenerative biology, our understanding of this unique type of tissue homeostasis remains rudimentary. We characterized the transcriptomic transformation of human endometrium at single-cell resolution across the menstrual cycle, resolving cellular heterogeneity in multiple dimensions. We profiled the behavior of seven endometrial cell types, including a previously uncharacterized ciliated cell type, during four major phases of endometrial transformation, and found characteristic signatures for each cell type and phase. We discovered that the human window of implantation opens with an abrupt and discontinuous transcriptomic activation in the epithelia, accompanied with a widespread decidualization feature in the stromal fibroblasts. Our study provides a high-resolution molecular and cellular characterization of human endometrial transformation across the menstrual cycle, providing insights into this essential physiological process.
Collapse
|
45
|
Lin C, Bar-Joseph Z. Continuous-state HMMs for modeling time-series single-cell RNA-Seq data. Bioinformatics 2020; 35:4707-4715. [PMID: 31038684 DOI: 10.1093/bioinformatics/btz296] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Revised: 02/11/2019] [Accepted: 04/18/2019] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION Methods for reconstructing developmental trajectories from time-series single-cell RNA-Seq (scRNA-Seq) data can be largely divided into two categories. The first, often referred to as pseudotime ordering methods are deterministic and rely on dimensionality reduction followed by an ordering step. The second learns a probabilistic branching model to represent the developmental process. While both types have been successful, each suffers from shortcomings that can impact their accuracy. RESULTS We developed a new method based on continuous-state HMMs (CSHMMs) for representing and modeling time-series scRNA-Seq data. We define the CSHMM model and provide efficient learning and inference algorithms which allow the method to determine both the structure of the branching process and the assignment of cells to these branches. Analyzing several developmental single-cell datasets, we show that the CSHMM method accurately infers branching topology and correctly and continuously assign cells to paths, improving upon prior methods proposed for this task. Analysis of genes based on the continuous cell assignment identifies known and novel markers for different cell types. AVAILABILITY AND IMPLEMENTATION Software and Supporting website: www.andrew.cmu.edu/user/chiehl1/CSHMM/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chieh Lin
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, US
| | - Ziv Bar-Joseph
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, US.,Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, US
| |
Collapse
|
46
|
Unsupervised generative and graph representation learning for modelling cell differentiation. Sci Rep 2020; 10:9790. [PMID: 32555334 PMCID: PMC7300092 DOI: 10.1038/s41598-020-66166-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2019] [Accepted: 02/10/2020] [Indexed: 12/22/2022] Open
Abstract
Using machine learning techniques to build representations from biomedical data can help us understand the latent biological mechanism of action and lead to important discoveries. Recent developments in single-cell RNA-sequencing protocols have allowed measuring gene expression for individual cells in a population, thus opening up the possibility of finding answers to biomedical questions about cell differentiation. In this paper, we explore unsupervised generative neural methods, based on the variational autoencoder, that can model cell differentiation by building meaningful representations from the high dimensional and complex gene expression data. We use disentanglement methods based on information theory to improve the data representation and achieve better separation of the biological factors of variation in the gene expression data. In addition, we use a graph autoencoder consisting of graph convolutional layers to predict relationships between single-cells. Based on these models, we develop a computational framework that consists of methods for identifying the cell types in the dataset, finding driver genes for the differentiation process and obtaining a better understanding of relationships between cells. We illustrate our methods on datasets from multiple species and also from different sequencing technologies.
Collapse
|
47
|
Chen Z, An S, Bai X, Gong F, Ma L, Wan L. DensityPath: an algorithm to visualize and reconstruct cell state-transition path on density landscape for single-cell RNA sequencing data. Bioinformatics 2020; 35:2593-2601. [PMID: 30535348 DOI: 10.1093/bioinformatics/bty1009] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Revised: 11/14/2018] [Accepted: 12/06/2018] [Indexed: 12/16/2022] Open
Abstract
MOTIVATION Visualizing and reconstructing cell developmental trajectories intrinsically embedded in high-dimensional expression profiles of single-cell RNA sequencing (scRNA-seq) snapshot data are computationally intriguing, but challenging. RESULTS We propose DensityPath, an algorithm allowing (i) visualization of the intrinsic structure of scRNA-seq data on an embedded 2-d space and (ii) reconstruction of an optimal cell state-transition path on the density landscape. DensityPath powerfully handles high dimensionality and heterogeneity of scRNA-seq data by (i) revealing the intrinsic structures of data, while adopting a non-linear dimension reduction algorithm, termed elastic embedding, which can preserve both local and global structures of the data; and (ii) extracting the topological features of high-density, level-set clusters from a single-cell multimodal density landscape of transcriptional heterogeneity, as the representative cell states. DensityPath reconstructs the optimal cell state-transition path by finding the geodesic minimum spanning tree of representative cell states on the density landscape, establishing a least action path with the minimum-transition-energy of cell fate decisions. We demonstrate that DensityPath can ably reconstruct complex trajectories of cell development, e.g. those with multiple bifurcating and trifurcating branches, while maintaining computational efficiency. Moreover, DensityPath has high accuracy for pseudotime calculation and branch assignment on real scRNA-seq, as well as simulated datasets. DensityPath is robust to parameter choices, as well as permutations of data. AVAILABILITY AND IMPLEMENTATION DensityPath software is available at https://github.com/ucasdp/DensityPath. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ziwei Chen
- NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing.,University of Chinese Academy of Sciences, Beijing
| | - Shaokun An
- NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing.,University of Chinese Academy of Sciences, Beijing
| | - Xiangqi Bai
- NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing.,University of Chinese Academy of Sciences, Beijing
| | - Fuzhou Gong
- NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing.,University of Chinese Academy of Sciences, Beijing
| | - Liang Ma
- University of Chinese Academy of Sciences, Beijing.,Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Lin Wan
- NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing.,University of Chinese Academy of Sciences, Beijing
| |
Collapse
|
48
|
Klimovskaia A, Lopez-Paz D, Bottou L, Nickel M. Poincaré maps for analyzing complex hierarchies in single-cell data. Nat Commun 2020; 11:2966. [PMID: 32528075 PMCID: PMC7290024 DOI: 10.1038/s41467-020-16822-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 05/25/2020] [Indexed: 01/23/2023] Open
Abstract
The need to understand cell developmental processes spawned a plethora of computational methods for discovering hierarchies from scRNAseq data. However, existing techniques are based on Euclidean geometry, a suboptimal choice for modeling complex cell trajectories with multiple branches. To overcome this fundamental representation issue we propose Poincaré maps, a method that harness the power of hyperbolic geometry into the realm of single-cell data analysis. Often understood as a continuous extension of trees, hyperbolic geometry enables the embedding of complex hierarchical data in only two dimensions while preserving the pairwise distances between points in the hierarchy. This enables the use of our embeddings in a wide variety of downstream data analysis tasks, such as visualization, clustering, lineage detection and pseudotime inference. When compared to existing methods — unable to address all these important tasks using a single embedding — Poincaré maps produce state-of-the-art two-dimensional representations of cell trajectories on multiple scRNAseq datasets. The discovery of hierarchies in biological processes is central to developmental biology. Here the authors propose Poincaré maps, a method based on hyperbolic geometry to discover continuous hierarchies from pairwise similarities.
Collapse
Affiliation(s)
| | | | - Léon Bottou
- Facebook AI, 770 Broadway, New York, NY, 10003, USA
| | | |
Collapse
|
49
|
Liao J, Lu X, Shao X, Zhu L, Fan X. Uncovering an Organ's Molecular Architecture at Single-Cell Resolution by Spatially Resolved Transcriptomics. Trends Biotechnol 2020; 39:43-58. [PMID: 32505359 DOI: 10.1016/j.tibtech.2020.05.006] [Citation(s) in RCA: 152] [Impact Index Per Article: 30.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2019] [Revised: 05/11/2020] [Accepted: 05/12/2020] [Indexed: 01/17/2023]
Abstract
Revealing fine-scale cellular heterogeneity among spatial context and the functional and structural foundations of tissue architecture is fundamental within biological research and pharmacology. Unlike traditional approaches involving single molecules or bulk omics, cutting-edge, spatially resolved transcriptomics techniques offer near-single-cell or even subcellular resolution within tissues. Massive information across higher dimensions along with position-coordinating labels can better map the whole 3D transcriptional landscape of tissues. In this review, we focus on developments and strategies in spatially resolved transcriptomics, compare the cell and gene throughput and spatial resolution in detail for existing methods, and highlight the enormous potential in biomedical research.
Collapse
Affiliation(s)
- Jie Liao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Xiaoyan Lu
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Xin Shao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Ling Zhu
- The Save Sight Institute, Faculty of Medicine and Health, the University of Sydney, Sydney, NSW 2000, Australia
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China; The Save Sight Institute, Faculty of Medicine and Health, the University of Sydney, Sydney, NSW 2000, Australia.
| |
Collapse
|
50
|
|