1
|
Askary A, Chen W, Choi J, Du LY, Elowitz MB, Gagnon JA, Schier AF, Seidel S, Shendure J, Stadler T, Tran M. The lives of cells, recorded. Nat Rev Genet 2025; 26:203-222. [PMID: 39587306 DOI: 10.1038/s41576-024-00788-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/26/2024] [Indexed: 11/27/2024]
Abstract
A paradigm for biology is emerging in which cells can be genetically programmed to write their histories into their own genomes. These records can subsequently be read, and the cellular histories reconstructed, which for each cell could include a record of its lineage relationships, extrinsic influences, internal states and physical locations, over time. DNA recording has the potential to transform the way that we study developmental and disease processes. Recent advances in genome engineering are driving the development of systems for DNA recording, and meanwhile single-cell and spatial omics technologies increasingly enable the recovery of the recorded information. Combined with advances in computational and phylogenetic inference algorithms, the DNA recording paradigm is beginning to bear fruit. In this Perspective, we explore the rationale and technical basis of DNA recording, what aspects of cellular biology might be recorded and how, and the types of discovery that we anticipate this paradigm will enable.
Collapse
Affiliation(s)
- Amjad Askary
- Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, CA, USA
| | - Wei Chen
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Junhong Choi
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Developmental Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Lucia Y Du
- Biozentrum, University of Basel, Basel, Switzerland
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
| | - Michael B Elowitz
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA.
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
- Howard Hughes Medical Institute, California Institute of Technology, Pasadena, CA, USA.
| | - James A Gagnon
- School of Biological Sciences, University of Utah, Salt Lake City, UT, USA.
| | - Alexander F Schier
- Biozentrum, University of Basel, Basel, Switzerland.
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA.
| | - Sophie Seidel
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA.
- Howard Hughes Medical Institute, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA.
- Seattle Hub for Synthetic Biology, Seattle, WA, USA.
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| | - Martin Tran
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| |
Collapse
|
2
|
Zwaans A, Seidel S, Manceau M, Stadler T. A Bayesian phylodynamic inference framework for single-cell CRISPR/Cas9 lineage tracing barcode data with dependent target sites. Philos Trans R Soc Lond B Biol Sci 2025; 380:20230318. [PMID: 39976408 PMCID: PMC11867110 DOI: 10.1098/rstb.2023.0318] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 08/03/2024] [Accepted: 08/05/2024] [Indexed: 02/21/2025] Open
Abstract
Analysing single-cell lineage relationships of an organism is crucial towards understanding the fundamental cellular dynamics that drive development. Clustered regularly interspaced short palindromic repeats (CRISPR)-based dynamic lineage tracing relies on recent advances in genome editing and sequencing technologies to generate inheritable, evolving genetic barcode sequences that enable reconstruction of such cell lineage trees, also referred to as phylogenetic trees. Recent work generated custom computational strategies to produce robust tree estimates from such data. We further capitalize on these advancements and introduce GESTALT analysis using Bayesian inference (GABI), which extends the analysis of genome editing of synthetic target arrays for lineage tracing (GESTALT) data to a fully integrated Bayesian phylogenetic inference framework in software BEAST 2. This implementation allows users to represent the uncertainty in reconstructed trees and enables their scaling in absolute time. Furthermore, based on such time-scaled lineage trees, the underlying processes of growth, differentiation and apoptosis are quantified through so-called phylodynamic inference, typically relying on a birth-death or coalescent model. After validating its implementation, we demonstrate that our methodology results in robust estimates of growth dynamics characteristic of early Danio rerio development. GABI's codebase is publicly available at https://github.com/azwaans/GABI.This article is part of the theme issue '"A mathematical theory of evolution": phylogenetic models dating back 100 years'.
Collapse
Affiliation(s)
- A. Zwaans
- Department of Biosystems Science and Engineering, ETH Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - S. Seidel
- Department of Biosystems Science and Engineering, ETH Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - M. Manceau
- Department of Biosystems Science and Engineering, ETH Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - T. Stadler
- Department of Biosystems Science and Engineering, ETH Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
3
|
Zhang X, Huang Y, Yang Y, Wang QE, Li L. Advancements in prospective single-cell lineage barcoding and their applications in research. Genome Res 2024; 34:2147-2162. [PMID: 39572229 PMCID: PMC11694748 DOI: 10.1101/gr.278944.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Accepted: 10/03/2024] [Indexed: 12/25/2024]
Abstract
Single-cell lineage tracing (scLT) has emerged as a powerful tool, providing unparalleled resolution to investigate cellular dynamics, fate determination, and the underlying molecular mechanisms. This review thoroughly examines the latest prospective lineage DNA barcode tracing technologies. It further highlights pivotal studies that leverage single-cell lentiviral integration barcoding technology to unravel the dynamic nature of cell lineages in both developmental biology and cancer research. Additionally, the review navigates through critical considerations for successful experimental design in lineage tracing and addresses challenges inherent in this field, including technical limitations, complexities in data analysis, and the imperative for standardization. It also outlines current gaps in knowledge and suggests future research directions, contributing to the ongoing advancement of scLT studies.
Collapse
Affiliation(s)
- Xiaoli Zhang
- College of Nursing, University of South Florida, Tampa, Florida 33620, USA;
| | - Yirui Huang
- College of Pharmacy, The Ohio State University, Columbus, Ohio 43210, USA
| | - Yajing Yang
- Department of Radiation Oncology, Comprehensive Cancer Center, The Ohio State University, Columbus, Ohio 43210, USA
| | - Qi-En Wang
- Department of Radiation Oncology, Comprehensive Cancer Center, The Ohio State University, Columbus, Ohio 43210, USA
| | - Lang Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, Ohio 43210, USA
| |
Collapse
|
4
|
Jones MG, Sun D, Min KH(J, Colgan WN, Tian L, Weir JA, Chen VZ, Koblan LW, Yost KE, Mathey-Andrews N, Russell AJ, Stickels RR, Balderrama KS, Rideout WM, Chang HY, Jacks T, Chen F, Weissman JS, Yosef N, Yang D. Spatiotemporal lineage tracing reveals the dynamic spatial architecture of tumor growth and metastasis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.21.619529. [PMID: 39484491 PMCID: PMC11526908 DOI: 10.1101/2024.10.21.619529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
Tumor progression is driven by dynamic interactions between cancer cells and their surrounding microenvironment. Investigating the spatiotemporal evolution of tumors can provide crucial insights into how intrinsic changes within cancer cells and extrinsic alterations in the microenvironment cooperate to drive different stages of tumor progression. Here, we integrate high-resolution spatial transcriptomics and evolving lineage tracing technologies to elucidate how tumor expansion, plasticity, and metastasis co-evolve with microenvironmental remodeling in a Kras;p53-driven mouse model of lung adenocarcinoma. We find that rapid tumor expansion contributes to a hypoxic, immunosuppressive, and fibrotic microenvironment that is associated with the emergence of pro-metastatic cancer cell states. Furthermore, metastases arise from spatially-confined subclones of primary tumors and remodel the distant metastatic niche into a fibrotic, collagen-rich microenvironment. Together, we present a comprehensive dataset integrating spatial assays and lineage tracing to elucidate how sequential changes in cancer cell state and microenvironmental structures cooperate to promote tumor progression.
Collapse
Affiliation(s)
- Matthew G. Jones
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA
- These authors contributed equally
| | - Dawei Sun
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
- These authors contributed equally
| | - Kyung Hoi (Joseph) Min
- Whitehead Institute for Biomedical Research, Cambridge, MA, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - William N. Colgan
- Whitehead Institute for Biomedical Research, Cambridge, MA, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Luyi Tian
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jackson A. Weir
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Biological and Biomedical Sciences Program, Harvard University, Cambridge, MA, USA
| | - Victor Z. Chen
- Department of Molecular Pharmacology and Therapeutics, Columbia University, New York City, NY, USA
- Department of Systems Biology, Columbia University, New York City, NY, USA
| | - Luke W. Koblan
- Whitehead Institute for Biomedical Research, Cambridge, MA, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Kathryn E. Yost
- Whitehead Institute for Biomedical Research, Cambridge, MA, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Nicolas Mathey-Andrews
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
- David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Andrew J.C. Russell
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | | | | | - William M. Rideout
- David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Howard Y. Chang
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Tyler Jacks
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
- David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Fei Chen
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - Jonathan S. Weissman
- Whitehead Institute for Biomedical Research, Cambridge, MA, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
- David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Nir Yosef
- Department of Systems Immunology, Weizmann Institute of Science, 234 Herzl Street, Rehovot 7610001, Israel
| | - Dian Yang
- Department of Molecular Pharmacology and Therapeutics, Columbia University, New York City, NY, USA
- Department of Systems Biology, Columbia University, New York City, NY, USA
- Herbert Irving Comprehensive Cancer Center, Columbia University, New York City, NY, USA
- Lead Contact
| |
Collapse
|
5
|
Lange M, Piran Z, Klein M, Spanjaard B, Klein D, Junker JP, Theis FJ, Nitzan M. Mapping lineage-traced cells across time points with moslin. Genome Biol 2024; 25:277. [PMID: 39434128 PMCID: PMC11492637 DOI: 10.1186/s13059-024-03422-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 10/10/2024] [Indexed: 10/23/2024] Open
Abstract
Simultaneous profiling of single-cell gene expression and lineage history holds enormous potential for studying cellular decision-making. Recent computational approaches combine both modalities into cellular trajectories; however, they cannot make use of all available lineage information in destructive time-series experiments. Here, we present moslin, a Gromov-Wasserstein-based model to couple cellular profiles across time points based on lineage and gene expression information. We validate our approach in simulations and demonstrate on Caenorhabditis elegans embryonic development how moslin predicts fate probabilities and putative decision driver genes. Finally, we use moslin to delineate lineage relationships among transiently activated fibroblast states during zebrafish heart regeneration.
Collapse
Affiliation(s)
- Marius Lange
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- Department of Mathematics, Technical University of Munich, Munich, Germany
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
| | - Zoe Piran
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | | | - Bastiaan Spanjaard
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, Berlin, Germany
- Department of Paediatric Oncology/Hematology, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Dominik Klein
- Department of Mathematics, Technical University of Munich, Munich, Germany
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
| | - Jan Philipp Junker
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, Berlin, Germany
- Charité-Universitätsmedizin Berlin, Berlin, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, Berlin, Germany
| | - Fabian J Theis
- Department of Mathematics, Technical University of Munich, Munich, Germany.
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany.
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
| | - Mor Nitzan
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel.
- Racah Institute of Physics, The Hebrew University of Jerusalem, Jerusalem, Israel.
- Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel.
| |
Collapse
|
6
|
Sashittal P, Zhang RY, Law BK, Strzalkowski A, Schmidt H, Bolondi A, Chan MM, Raphael BJ. Inferring cell differentiation maps from lineage tracing data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.09.611835. [PMID: 39314473 PMCID: PMC11419031 DOI: 10.1101/2024.09.09.611835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
During development, mulitpotent cells differentiate through a hierarchy of increasingly restricted progenitor cell types until they realize specialized cell types. A cell differentiation map describes this hierarchy, and inferring these maps is an active area of research spanning traditional single marker lineage studies to data-driven trajectory inference methods on single-cell RNA-seq data. Recent high-throughput lineage tracing technologies profile lineages and cell types at scale, but current methods to infer cell differentiation maps from these data rely on simple models with restrictive assumptions about the developmental process. We introduce a mathematical framework for cell differentiation maps based on the concept of potency, and develop an algorithm, Carta, that infers an optimal cell differentiation map from single-cell lineage tracing data. The key insight in Carta is to balance the trade-off between the complexity of the cell differentiation map and the number of unobserved cell type transitions on the lineage tree. We show that Carta more accurately infers cell differentiation maps on both simulated and real data compared to existing methods. In models of mammalian trunk development and mouse hematopoiesis, Carta identifies important features of development that are not revealed by other methods including convergent differentiation of specialized cell types, progenitor differentiation dynamics, and the refinement of routes of differentiation via new intermediate progenitors.
Collapse
Affiliation(s)
- Palash Sashittal
- Dept. of Computer Science, Princeton University, Princeton; 08544 NJ, USA
| | - Richard Y. Zhang
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton; 08544 NJ, USA
| | - Benjamin K. Law
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton; 08544 NJ, USA
- Dept. of Molecular Biology, Princeton University, Princeton; 08544 NJ, USA
| | | | - Henri Schmidt
- Dept. of Computer Science, Princeton University, Princeton; 08544 NJ, USA
| | - Adriano Bolondi
- Dept. of Genome Regulation, Max Planck Institute for Molecular Genetics; 14195 Berlin, Germany
| | - Michelle M. Chan
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton; 08544 NJ, USA
- Dept. of Molecular Biology, Princeton University, Princeton; 08544 NJ, USA
| | | |
Collapse
|
7
|
Liu F, Zhang X, Yang Y. Simulation of CRISPR-Cas9 editing on evolving barcode and accuracy of lineage tracing. Sci Rep 2024; 14:19213. [PMID: 39160220 PMCID: PMC11333585 DOI: 10.1038/s41598-024-70154-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 08/13/2024] [Indexed: 08/21/2024] Open
Abstract
We designed a simulation program that mimics the CRISPR-Cas9 editing on evolving barcode and double strand break repair procedure along with cell divisions. Emerging barcode mutations tend to build upon previously existing mutations, occurring sequentially with each generation. This process results in a unique mutation profile in each cell. We sample the barcodes in leaf cells and reconstruct the lineage, comparing it to the original lineage tree to test algorithm accuracy under different parameter settings. Our computational simulations validate the reasonable assumptions deduced from experimental observations, emphasizing that factors such as sampling size, barcode length, multiple barcodes, indel probabilities, and Cas9 activity are critical for accurate and successful lineage tracing. Among the many factors we found that sampling size and indel probabilities are two major ones that affect lineage tracing accuracy. Large segment deletions in early generations could greatly impact lineage accuracy. These simulation results offer insightful recommendations for enhancing the design and analysis of Cas9-mediated molecular barcodes in actual experiments.
Collapse
Affiliation(s)
- Fengshuo Liu
- Graduate Program in Cancer and Cell Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Xiang Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Yipeng Yang
- Department of Mathematics and Statistics, University of Houston - Clear Lake, 2700 Bay Area Blvd, Houston, TX, 77058, USA.
| |
Collapse
|
8
|
Yoon B, Kim H, Jung SW, Park J. Single-cell lineage tracing approaches to track kidney cell development and maintenance. Kidney Int 2024; 105:1186-1199. [PMID: 38554991 DOI: 10.1016/j.kint.2024.01.045] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 12/06/2023] [Accepted: 01/09/2024] [Indexed: 04/02/2024]
Abstract
The kidney is a complex organ consisting of various cell types. Previous studies have aimed to elucidate the cellular relationships among these cell types in developing and mature kidneys using Cre-loxP-based lineage tracing. However, this methodology falls short of fully capturing the heterogeneous nature of the kidney, making it less than ideal for comprehensively tracing cellular progression during kidney development and maintenance. Recent technological advancements in single-cell genomics have revolutionized lineage tracing methods. Single-cell lineage tracing enables the simultaneous tracing of multiple cell types within complex tissues and their transcriptomic profiles, thereby allowing the reconstruction of their lineage tree with cell state information. Although single-cell lineage tracing has been successfully applied to investigate cellular hierarchies in various organs and tissues, its application in kidney research is currently lacking. This review comprehensively consolidates the single-cell lineage tracing methods, divided into 4 categories (clustered regularly interspaced short palindromic repeat [CRISPR]/CRISPR-associated protein 9 [Cas9]-based, transposon-based, Polylox-based, and native barcoding methods), and outlines their technical advantages and disadvantages. Furthermore, we propose potential future research topics in kidney research that could benefit from single-cell lineage tracing and suggest suitable technical strategies to apply to these topics.
Collapse
Affiliation(s)
- Baul Yoon
- School of Life Sciences, Gwangju Institute of Science and Technology (GIST), Gwangju, Republic of Korea
| | - Hayoung Kim
- School of Life Sciences, Gwangju Institute of Science and Technology (GIST), Gwangju, Republic of Korea
| | - Su Woong Jung
- Division of Nephrology, Department of Internal Medicine, College of Medicine, Kyung Hee University, Seoul, Republic of Korea; Division of Nephrology, Department of Internal Medicine, Kyung Hee University Hospital at Gangdong, Seoul, Republic of Korea.
| | - Jihwan Park
- School of Life Sciences, Gwangju Institute of Science and Technology (GIST), Gwangju, Republic of Korea.
| |
Collapse
|
9
|
Wang K, Hou L, Wang X, Zhai X, Lu Z, Zi Z, Zhai W, He X, Curtis C, Zhou D, Hu Z. PhyloVelo enhances transcriptomic velocity field mapping using monotonically expressed genes. Nat Biotechnol 2024; 42:778-789. [PMID: 37524958 DOI: 10.1038/s41587-023-01887-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 06/28/2023] [Indexed: 08/02/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) is a powerful approach for studying cellular differentiation, but accurately tracking cell fate transitions can be challenging, especially in disease conditions. Here we introduce PhyloVelo, a computational framework that estimates the velocity of transcriptomic dynamics by using monotonically expressed genes (MEGs) or genes with expression patterns that either increase or decrease, but do not cycle, through phylogenetic time. Through integration of scRNA-seq data with lineage information, PhyloVelo identifies MEGs and reconstructs a transcriptomic velocity field. We validate PhyloVelo using simulated data and Caenorhabditis elegans ground truth data, successfully recovering linear, bifurcated and convergent differentiations. Applying PhyloVelo to seven lineage-traced scRNA-seq datasets, generated using CRISPR-Cas9 editing, lentiviral barcoding or immune repertoire profiling, demonstrates its high accuracy and robustness in inferring complex lineage trajectories while outperforming RNA velocity. Additionally, we discovered that MEGs across tissues and organisms share similar functions in translation and ribosome biogenesis.
Collapse
Affiliation(s)
- Kun Wang
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- School of Mathematical Sciences, Xiamen University, Xiamen, China
| | - Liangzhen Hou
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Faculty of Health Sciences, University of Macau, Taipa, Macau, China
| | - Xin Wang
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Xiangwei Zhai
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, China
| | - Zhaolian Lu
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Zhike Zi
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Weiwei Zhai
- CAS Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| | - Xionglei He
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, China
| | - Christina Curtis
- Department of Medicine, Division of Oncology, Stanford University School of Medicine, Stanford, CA, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Da Zhou
- School of Mathematical Sciences, Xiamen University, Xiamen, China.
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China.
| | - Zheng Hu
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
| |
Collapse
|
10
|
Mai U, Chu G, Raphael BJ. Maximum Likelihood Inference of Time-scaled Cell Lineage Trees with Mixed-type Missing Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.05.583638. [PMID: 38496496 PMCID: PMC10942411 DOI: 10.1101/2024.03.05.583638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Recent dynamic lineage tracing technologies combine CRISPR-based genome editing with single-cell sequencing to track cell divisions during development. A key computational problem in dynamic lineage tracing is to infer a cell lineage tree from the measured CRISPR-induced mutations. Three features of dynamic lineage tracing data distinguish this problem from standard phylogenetic tree inference. First, the CRISPR-editing process modifies a genomic location exactly once. This non-modifiable property is not well described by the time-reversible models commonly used in phylogenetics. Second, as a consequence of non-modifiability, the number of mutations per time unit decreases over time. Third, CRISPR-based genome-editing and single-cell sequencing results in high rates of both heritable and non-heritable (dropout) missing data. To model these features, we introduce the Probabilistic Mixed-type Missing (PMM) model. We describe an algorithm, LAML (Lineage Analysis via Maximum Likelihood), to search for the maximum likelihood (ML) tree under the PMM model. LAML combines an Expectation Maximization (EM) algorithm with a heuristic tree search to jointly estimate tree topology, branch lengths and missing data parameters. We derive a closed-form solution for the M-step in the case of no heritable missing data, and a block coordinate ascent approach in the general case which is more efficient than the standard General Time Reversible (GTR) phylogenetic model. On simulated data, LAML infers more accurate tree topologies and branch lengths than existing methods, with greater advantages on datasets with higher ratios of heritable to non-heritable missing data. We show that LAML provides unbiased time-scaled estimates of branch lengths. In contrast, we demonstrate that maximum parsimony methods for lineage tracing data not only underestimate branch lengths, but also yield branch lengths which are not proportional to time, due to the nonlinear decay in the number of mutations on branches further from the root. On lineage tracing data from a mouse model of lung adenocarcinoma, we show that LAML infers phylogenetic distances that are more concordant with gene expression data compared to distances derived from maximum parsimony. The LAML tree topology is more plausible than existing published trees, with fewer total cell migrations between distant metastases and fewer reseeding events where cells migrate back to the primary tumor. Crucially, we identify three distinct time epochs of metastasis progression, which includes a burst of metastasis events to various anatomical sites during a single month.
Collapse
Affiliation(s)
| | | | - Benjamin J. Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
11
|
Pan X, Zhang X. Studying temporal dynamics of single cells: expression, lineage and regulatory networks. Biophys Rev 2024; 16:57-67. [PMID: 38495440 PMCID: PMC10937865 DOI: 10.1007/s12551-023-01090-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 06/27/2023] [Indexed: 03/19/2024] Open
Abstract
Learning how multicellular organs are developed from single cells to different cell types is a fundamental problem in biology. With the high-throughput scRNA-seq technology, computational methods have been developed to reveal the temporal dynamics of single cells from transcriptomic data, from phenomena on cell trajectories to the underlying mechanism that formed the trajectory. There are several distinct families of computational methods including Trajectory Inference (TI), Lineage Tracing (LT), and Gene Regulatory Network (GRN) Inference which are involved in such studies. This review summarizes these computational approaches which use scRNA-seq data to study cell differentiation and cell fate specification as well as the advantages and limitations of different methods. We further discuss how GRNs can potentially affect cell fate decisions and trajectory structures. Supplementary Information The online version contains supplementary material available at 10.1007/s12551-023-01090-5.
Collapse
Affiliation(s)
- Xinhai Pan
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA
| | - Xiuwei Zhang
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA
| |
Collapse
|
12
|
Li Z, Yang W, Wu P, Shan Y, Zhang X, Chen F, Yang J, Yang JR. Reconstructing cell lineage trees with genomic barcoding: approaches and applications. J Genet Genomics 2024; 51:35-47. [PMID: 37269980 DOI: 10.1016/j.jgg.2023.05.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 05/19/2023] [Accepted: 05/20/2023] [Indexed: 06/05/2023]
Abstract
In multicellular organisms, developmental history of cell divisions and functional annotation of terminal cells can be organized into a cell lineage tree (CLT). The reconstruction of the CLT has long been a major goal in developmental biology and other related fields. Recent technological advancements, especially those in editable genomic barcodes and single-cell high-throughput sequencing, have sparked a new wave of experimental methods for reconstructing CLTs. Here we review the existing experimental approaches to the reconstruction of CLT, which are broadly categorized as either image-based or DNA barcode-based methods. In addition, we present a summary of the related literature based on the biological insight provided by the obtained CLTs. Moreover, we discuss the challenges that will arise as more and better CLT data become available in the near future. Genomic barcoding-based CLT reconstructions and analyses, due to their wide applicability and high scalability, offer the potential for novel biological discoveries, especially those related to general and systemic properties of the developmental process.
Collapse
Affiliation(s)
- Zizhang Li
- Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong 510080, China; Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong 510080, China
| | - Wenjing Yang
- Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong 510080, China
| | - Peng Wu
- Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong 510080, China
| | - Yuyan Shan
- Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong 510080, China
| | - Xiaoyu Zhang
- Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong 510080, China
| | - Feng Chen
- Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong 510080, China; Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong 510080, China
| | - Junnan Yang
- Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong 510080, China
| | - Jian-Rong Yang
- Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong 510080, China; Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong 510080, China; Key Laboratory of Tropical Disease Control, Ministry of Education, Sun Yat-sen University, Guangzhou, Guangdong 510080, China.
| |
Collapse
|
13
|
Sashittal P, Schmidt H, Chan M, Raphael BJ. Startle: A star homoplasy approach for CRISPR-Cas9 lineage tracing. Cell Syst 2023; 14:1113-1121.e9. [PMID: 38128483 PMCID: PMC11257033 DOI: 10.1016/j.cels.2023.11.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Revised: 10/31/2023] [Accepted: 11/17/2023] [Indexed: 12/23/2023]
Abstract
CRISPR-Cas9-based genome editing combined with single-cell sequencing enables the tracing of the history of cell divisions, or cellular lineage, in tissues and whole organisms. Although standard phylogenetic approaches may be applied to reconstruct cellular lineage trees from this data, the unique features of the CRISPR-Cas9 editing process motivate the development of specialized models that describe the evolution of CRISPR-Cas9-induced mutations. Here, we introduce the "star homoplasy" evolutionary model that constrains a phylogenetic character to mutate at most once along a lineage, capturing the "non-modifiability" property of CRISPR-Cas9 mutations. We derive a combinatorial characterization of star homoplasy phylogenies and use this characterization to develop an algorithm, "Startle", that computes a maximum parsimony star homoplasy phylogeny. We demonstrate that Startle infers more accurate phylogenies on simulated lineage tracing data compared with existing methods and finds parsimonious phylogenies with fewer metastatic migrations on lineage tracing data from mouse metastatic lung adenocarcinoma.
Collapse
Affiliation(s)
- Palash Sashittal
- Department of Computer Science, Princeton University, Princeton, NJ, USA
| | - Henri Schmidt
- Department of Computer Science, Princeton University, Princeton, NJ, USA
| | - Michelle Chan
- Department of Molecular Biology, Princeton University, Princeton, NJ, USA
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
14
|
Pan X, Li H, Putta P, Zhang X. LinRace: cell division history reconstruction of single cells using paired lineage barcode and gene expression data. Nat Commun 2023; 14:8388. [PMID: 38104156 PMCID: PMC10725445 DOI: 10.1038/s41467-023-44173-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 12/03/2023] [Indexed: 12/19/2023] Open
Abstract
Lineage tracing technology using CRISPR/Cas9 genome editing has enabled simultaneous readouts of gene expressions and lineage barcodes in single cells, which allows for inference of cell lineage and cell types at the whole organism level. While most state-of-the-art methods for lineage reconstruction utilize only the lineage barcode data, methods that incorporate gene expressions are emerging. Effectively incorporating the gene expression data requires a reasonable model of how gene expression data changes along generations of divisions. Here, we present LinRace (Lineage Reconstruction with asymmetric cell division model), which integrates lineage barcode and gene expression data using asymmetric cell division model and infers cell lineages and ancestral cell states using Neighbor-Joining and maximum-likelihood heuristics. On both simulated and real data, LinRace outputs more accurate cell division trees than existing methods. With inferred ancestral states, LinRace can also show how a progenitor cell generates a large population of cells with various functionalities.
Collapse
Affiliation(s)
- Xinhai Pan
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
| | - Hechen Li
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
| | - Pranav Putta
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
| | - Xiuwei Zhang
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA.
| |
Collapse
|
15
|
Prillo S, Ravoor A, Yosef N, Song YS. ConvexML: Scalable and accurate inference of single-cell chronograms from CRISPR/Cas9 lineage tracing data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.03.569785. [PMID: 38076815 PMCID: PMC10705529 DOI: 10.1101/2023.12.03.569785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
CRISPR/Cas9 gene editing technology has enabled lineage tracing for thousands of cells in vivo. However, most of the analysis of CRISPR/Cas9 lineage tracing data has so far been limited to the reconstruction of single-cell tree topologies, which depict lineage relationships between cells, but not the amount of time that has passed between ancestral cell states and the present. Time-resolved trees, known as chronograms, would allow one to study the evolutionary dynamics of cell populations at an unprecedented level of resolution. Indeed, time-resolved trees would reveal the timing of events on the tree, the relative fitness of subclones, and the dynamics underlying phenotypic changes in the cell population - among other important applications. In this work, we introduce the first scalable and accurate method to refine any given single-cell tree topology into a single-cell chronogram by estimating its branch lengths. To do this, we leverage a statistical model of CRISPR/Cas9 cutting with missing data, paired with a conservative version of maximum parsimony that reconstructs only the ancestral states that we are confident about. As part of our method, we propose a novel approach to represent and handle missing data - specifically, double-resection events - which greatly simplifies and speeds up branch length estimation without compromising quality. All this leads to a convex maximum likelihood estimation (MLE) problem that can be readily solved in seconds with off-the-shelf convex optimization solvers. To stabilize estimates in low-information regimes, we propose a simple penalized version of MLE using a minimum branch length and pseudocounts. We benchmark our method using simulations and show that it performs well on several tasks, outperforming more naive baselines. Our method, which we name 'ConvexML', is available through the cassiopeia open source Python package.
Collapse
Affiliation(s)
| | - Akshay Ravoor
- Computer Science Division, University of California, Berkeley
| | - Nir Yosef
- Computer Science Division, University of California, Berkeley
- Department of Systems Immunology, Weizmann Institute of Science
| | - Yun S. Song
- Computer Science Division, University of California, Berkeley
- Department of Statistics, University of California, Berkeley
| |
Collapse
|
16
|
Prusokiene A, Prusokas A, Retkute R. Machine learning based lineage tree reconstruction improved with knowledge of higher level relationships between cells and genomic barcodes. NAR Genom Bioinform 2023; 5:lqad077. [PMID: 37608801 PMCID: PMC10440785 DOI: 10.1093/nargab/lqad077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 06/26/2023] [Accepted: 08/11/2023] [Indexed: 08/24/2023] Open
Abstract
Tracking cells as they divide and progress through differentiation is a fundamental step in understanding many biological processes, such as the development of organisms and progression of diseases. In this study, we investigate a machine learning approach to reconstruct lineage trees in experimental systems based on mutating synthetic genomic barcodes. We refine previously proposed methodology by embedding information of higher level relationships between cells and single-cell barcode values into a feature space. We test performance of the algorithm on shallow trees (up to 100 cells) and deep trees (up to 10 000 cells). Our proposed algorithm can improve tree reconstruction accuracy in comparison to reconstructions based on a maximum parsimony method, but this comes at a higher computational time requirement.
Collapse
Affiliation(s)
- Alisa Prusokiene
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, UK
| | | | - Renata Retkute
- Department of Plant Sciences, University of Cambridge, Downing Street, Cambridge CB2 3EA, UK
| |
Collapse
|
17
|
Xie L, Liu H, You Z, Wang L, Li Y, Zhang X, Ji X, He H, Yuan T, Zheng W, Wu Z, Xiong M, Wei W, Chen Y. Comprehensive spatiotemporal mapping of single-cell lineages in developing mouse brain by CRISPR-based barcoding. Nat Methods 2023; 20:1244-1255. [PMID: 37460718 DOI: 10.1038/s41592-023-01947-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 06/06/2023] [Indexed: 08/09/2023]
Abstract
A fundamental interest in developmental neuroscience lies in the ability to map the complete single-cell lineages within the brain. To this end, we developed a CRISPR editing-based lineage-specific tracing (CREST) method for clonal tracing in Cre mice. We then used two complementary strategies based on CREST to map single-cell lineages in developing mouse ventral midbrain (vMB). By applying snapshotting CREST (snapCREST), we constructed a spatiotemporal lineage landscape of developing vMB and identified six progenitor archetypes that could represent the principal clonal fates of individual vMB progenitors and three distinct clonal lineages in the floor plate that specified glutamatergic, dopaminergic or both neurons. We further created pandaCREST (progenitor and derivative associating CREST) to associate the transcriptomes of progenitor cells in vivo with their differentiation potentials. We identified multiple origins of dopaminergic neurons and demonstrated that a transcriptome-defined progenitor type comprises heterogeneous progenitors, each with distinct clonal fates and molecular signatures. Therefore, the CREST method and strategies allow comprehensive single-cell lineage analysis that could offer new insights into the molecular programs underlying neural specification.
Collapse
Affiliation(s)
- Lianshun Xie
- Institute of Neuroscience, Key Laboratory of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Hengxin Liu
- University of Chinese Academy of Sciences, Beijing, China
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Zhiwen You
- Institute of Neuroscience, Key Laboratory of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Luyue Wang
- University of Chinese Academy of Sciences, Beijing, China
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Yiwen Li
- Institute of Neuroscience, Key Laboratory of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China
| | - Xinyue Zhang
- Institute of Neuroscience, Key Laboratory of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xiaoshan Ji
- Department of Neonatology, Children's Hospital of Fudan University, National Children's Medical Center, Shanghai, China
| | - Hui He
- Institute of Neuroscience, Key Laboratory of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China
| | - Tingli Yuan
- Institute of Neuroscience, Key Laboratory of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China
| | - Wenping Zheng
- Institute of Neuroscience, Key Laboratory of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China
| | - Ziyan Wu
- UniXell Biotechnology, Shanghai, China
| | - Man Xiong
- State Key Laboratory of Medical Neurobiology-Ministry of Education Frontiers Center for Brain Science, Institutes of Brain Science, Fudan University, Shanghai, China
| | - Wu Wei
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China.
- Center for Biomedical Informatics, Shanghai Engineering Research Center for Big Data in Pediatric Precision Medicine, Shanghai Children's Hospital, Shanghai Jiao Tong University, Shanghai, China.
- Lingang Laboratory, Shanghai, China.
| | - Yuejun Chen
- Institute of Neuroscience, Key Laboratory of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China.
- Shanghai Center for Brain Science and Brain-Inspired Intelligence Technology, Shanghai, China.
| |
Collapse
|
18
|
Heumos L, Schaar AC, Lance C, Litinetskaya A, Drost F, Zappia L, Lücken MD, Strobl DC, Henao J, Curion F, Schiller HB, Theis FJ. Best practices for single-cell analysis across modalities. Nat Rev Genet 2023; 24:550-572. [PMID: 37002403 PMCID: PMC10066026 DOI: 10.1038/s41576-023-00586-w] [Citation(s) in RCA: 366] [Impact Index Per Article: 183.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/14/2023] [Indexed: 04/03/2023]
Abstract
Recent advances in single-cell technologies have enabled high-throughput molecular profiling of cells across modalities and locations. Single-cell transcriptomics data can now be complemented by chromatin accessibility, surface protein expression, adaptive immune receptor repertoire profiling and spatial information. The increasing availability of single-cell data across modalities has motivated the development of novel computational methods to help analysts derive biological insights. As the field grows, it becomes increasingly difficult to navigate the vast landscape of tools and analysis steps. Here, we summarize independent benchmarking studies of unimodal and multimodal single-cell analysis across modalities to suggest comprehensive best-practice workflows for the most common analysis steps. Where independent benchmarks are not available, we review and contrast popular methods. Our article serves as an entry point for novices in the field of single-cell (multi-)omic analysis and guides advanced users to the most recent best practices.
Collapse
Affiliation(s)
- Lukas Heumos
- Institute of Computational Biology, Department of Computational Health, Helmholtz Munich, Munich, Germany
- Institute of Lung Health and Immunity and Comprehensive Pneumology Center, Helmholtz Munich; Member of the German Center for Lung Research (DZL), Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Anna C Schaar
- Institute of Computational Biology, Department of Computational Health, Helmholtz Munich, Munich, Germany
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Munich Center for Machine Learning, Technical University of Munich, Garching, Germany
| | - Christopher Lance
- Institute of Computational Biology, Department of Computational Health, Helmholtz Munich, Munich, Germany
- Department of Paediatrics, Dr von Hauner Children's Hospital, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Anastasia Litinetskaya
- Institute of Computational Biology, Department of Computational Health, Helmholtz Munich, Munich, Germany
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Felix Drost
- Institute of Computational Biology, Department of Computational Health, Helmholtz Munich, Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Luke Zappia
- Institute of Computational Biology, Department of Computational Health, Helmholtz Munich, Munich, Germany
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Malte D Lücken
- Institute of Computational Biology, Department of Computational Health, Helmholtz Munich, Munich, Germany
- Institute of Lung Health and Immunity, Helmholtz Munich, Munich, Germany
| | - Daniel C Strobl
- Institute of Computational Biology, Department of Computational Health, Helmholtz Munich, Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
- Institute of Clinical Chemistry and Pathobiochemistry, School of Medicine, Technical University of Munich, Munich, Germany
- TranslaTUM, Center for Translational Cancer Research, Technical University of Munich, Munich, Germany
| | - Juan Henao
- Institute of Computational Biology, Department of Computational Health, Helmholtz Munich, Munich, Germany
| | - Fabiola Curion
- Institute of Computational Biology, Department of Computational Health, Helmholtz Munich, Munich, Germany
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Herbert B Schiller
- Institute of Lung Health and Immunity and Comprehensive Pneumology Center, Helmholtz Munich; Member of the German Center for Lung Research (DZL), Munich, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Department of Computational Health, Helmholtz Munich, Munich, Germany.
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Munich Center for Machine Learning, Technical University of Munich, Garching, Germany.
| |
Collapse
|
19
|
Natesan G, Hamilton T, Deeds EJ, Shah PK. Novel metrics reveal new structure and unappreciated heterogeneity in C. elegans development. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.12.540617. [PMID: 37292606 PMCID: PMC10245744 DOI: 10.1101/2023.05.12.540617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
High throughput experimental approaches are increasingly allowing for the quantitative description of cellular and organismal phenotypes. Distilling these large volumes of complex data into meaningful measures that can drive biological insight remains a central challenge. In the quantitative study of development, for instance, one can resolve phenotypic measures for single cells onto their lineage history, enabling joint consideration of heritable signals and cell fate decisions. Most attempts to analyze this type of data, however, discard much of the information content contained within lineage trees. In this work we introduce a generalized metric, which we term the branch distance, that allows us to compare any two embryos based on phenotypic measurements in individual cells. This approach aligns those phenotypic measurements to the underlying lineage tree, providing a flexible and intuitive framework for quantitative comparisons between, for instance, Wild-Type (WT) and mutant developmental programs. We apply this novel metric to data on cell-cycle timing from over 1300 WT and RNAi-treated Caenorhabditis elegans embryos. Our new metric revealed surprising heterogeneity within this data set, including subtle batch effects in WT embryos and dramatic variability in RNAi-induced developmental phenotypes, all of which had been missed in previous analyses. Further investigation of these results suggests a novel, quantitative link between pathways that govern cell fate decisions and pathways that pattern cell cycle timing in the early embryo. Our work demonstrates that the branch distance we propose, and similar metrics like it, have the potential to revolutionize our quantitative understanding of organismal phenotype.
Collapse
Affiliation(s)
- Gunalan Natesan
- Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, CA
| | - Timothy Hamilton
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, CA
| | - Eric J. Deeds
- Department of Integrative Biology and Physiology, University of California, Los Angeles, CA
- Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, CA
| | - Pavak K. Shah
- Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, CA
- Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, CA
| |
Collapse
|
20
|
Pan X, Li H, Putta P, Zhang X. LinRace: single cell lineage reconstruction using paired lineage barcode and gene expression data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.12.536601. [PMID: 37090498 PMCID: PMC10120693 DOI: 10.1101/2023.04.12.536601] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
Understanding how single cells divide and differentiate into different cell types in developed organs is one of the major tasks of developmental and stem cell biology. Recently, lineage tracing technology using CRISPR/Cas9 genome editing has enabled simultaneous readouts of gene expressions and lineage barcodes in single cells, which allows for the reconstruction of the cell division tree, and even the detection of cell types and differentiation trajectories at the whole organism level. While most state-of-the-art methods for lineage reconstruction utilize only the lineage barcode data, methods that incorporate gene expression data are emerging, aiming to improve the accuracy of lineage reconstruction. However, effectively incorporating the gene expression data requires a reasonable model on how gene expression data changes along generations of divisions. Here, we present LinRace (Lineage Reconstruction with asymmetric cell division model), a method that integrates the lineage barcode and gene expression data using the asymmetric cell division model and infers cell lineage under a framework combining Neighbor Joining and maximum-likelihood heuristics. On both simulated and real data, LinRace outputs more accurate cell division trees than existing methods. Moreover, LinRace can output the cell states (cell types) of ancestral cells, which is rarely performed with existing lineage reconstruction methods. The information on ancestral cells can be used to analyze how a progenitor cell generates a large population of cells with various functionalities. LinRace is available at: https://github.com/ZhangLabGT/LinRace.
Collapse
Affiliation(s)
- Xinhai Pan
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta GA 30332, USA
| | - Hechen Li
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta GA 30332, USA
| | - Pranav Putta
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta GA 30332, USA
| | - Xiuwei Zhang
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta GA 30332, USA
| |
Collapse
|
21
|
Fang W, Bell CM, Sapirstein A, Asami S, Leeper K, Zack DJ, Ji H, Kalhor R. Quantitative fate mapping: A general framework for analyzing progenitor state dynamics via retrospective lineage barcoding. Cell 2022; 185:4604-4620.e32. [PMID: 36423582 PMCID: PMC9708097 DOI: 10.1016/j.cell.2022.10.028] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 08/23/2022] [Accepted: 10/26/2022] [Indexed: 11/24/2022]
Abstract
Natural and induced somatic mutations that accumulate in the genome during development record the phylogenetic relationships of cells; whether these lineage barcodes capture the complex dynamics of progenitor states remains unclear. We introduce quantitative fate mapping, an approach to reconstruct the hierarchy, commitment times, population sizes, and commitment biases of intermediate progenitor states during development based on a time-scaled phylogeny of their descendants. To reconstruct time-scaled phylogenies from lineage barcodes, we introduce Phylotime, a scalable maximum likelihood clustering approach based on a general barcoding mutagenesis model. We validate these approaches using realistic in silico and in vitro barcoding experiments. We further establish criteria for the number of cells that must be analyzed for robust quantitative fate mapping and a progenitor state coverage statistic to assess the robustness. This work demonstrates how lineage barcodes, natural or synthetic, enable analyzing progenitor fate and dynamics long after embryonic development in any organism.
Collapse
Affiliation(s)
- Weixiang Fang
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA; Center for Epigenetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Claire M Bell
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; Department of Ophthalmology, Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Abel Sapirstein
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; Center for Epigenetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Soichiro Asami
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; Center for Epigenetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Kathleen Leeper
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; Center for Epigenetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Donald J Zack
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; Department of Ophthalmology, Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Hongkai Ji
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA.
| | - Reza Kalhor
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; Center for Epigenetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.
| |
Collapse
|
22
|
Seidel S, Stadler T. TiDeTree: a Bayesian phylogenetic framework to estimate single-cell trees and population dynamic parameters from genetic lineage tracing data. Proc Biol Sci 2022; 289:20221844. [PMID: 36350216 PMCID: PMC9653226 DOI: 10.1098/rspb.2022.1844] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
The development of organisms and tissues is dictated by an elaborate balance between cell division, apoptosis and differentiation: the cell population dynamics. To quantify these dynamics, we propose a phylodynamic inference approach based on single-cell lineage recorder data. We developed a Bayesian phylogenetic framework-time-scaled developmental trees (TiDeTree)-that uses lineage recorder data to estimate time-scaled single-cell trees. By implementing TiDeTree within BEAST 2, we enable joint inference of the time-scaled trees and the cell population dynamics. We validated TiDeTree using simulations and showed that performance further improves when including multiple independent sources of information into the inference, such as frequencies of editing outcomes or experimental replicates. We benchmarked TiDeTree against state-of-the-art methods and show comparable performance in terms of tree topology, plus direct assessment of uncertainty and co-estimation of additional parameters. To demonstrate TiDeTree's use in practice, we analysed a public dataset containing lineage data from approximately 100 stem cell colonies. We estimated a time-scaled phylogeny for each colony; as well as the cell division and apoptosis rates underlying the growth dynamics of all colonies. We envision that TiDeTree will find broad application in the analysis of single-cell lineage tracing data, which will improve our understanding of cellular processes during development.
Collapse
Affiliation(s)
- Sophie Seidel
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| |
Collapse
|
23
|
Chen C, Liao Y, Peng G. Connecting past and present: single-cell lineage tracing. Protein Cell 2022; 13:790-807. [PMID: 35441356 PMCID: PMC9237189 DOI: 10.1007/s13238-022-00913-7] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 03/06/2022] [Indexed: 01/16/2023] Open
Abstract
Central to the core principle of cell theory, depicting cells' history, state and fate is a fundamental goal in modern biology. By leveraging clonal analysis and single-cell RNA-seq technologies, single-cell lineage tracing provides new opportunities to interrogate both cell states and lineage histories. During the past few years, many strategies to achieve lineage tracing at single-cell resolution have been developed, and three of them (integration barcodes, polylox barcodes, and CRISPR barcodes) are noteworthy as they are amenable in experimentally tractable systems. Although the above strategies have been demonstrated in animal development and stem cell research, much care and effort are still required to implement these methods. Here we review the development of single-cell lineage tracing, major characteristics of the cell barcoding strategies, applications, as well as technical considerations and limitations, providing a guide to choose or improve the single-cell barcoding lineage tracing.
Collapse
Affiliation(s)
- Cheng Chen
- Center for Cell Lineage and Development, CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, GIBH-HKU Guangdong-Hong Kong Stem Cell and Regenerative Medicine Research Centre, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou, 510530, China
| | - Yuanxin Liao
- Center for Cell Lineage and Development, CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, GIBH-HKU Guangdong-Hong Kong Stem Cell and Regenerative Medicine Research Centre, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou, 510530, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Guangdun Peng
- Center for Cell Lineage and Development, CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, GIBH-HKU Guangdong-Hong Kong Stem Cell and Regenerative Medicine Research Centre, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou, 510530, China.
- Center for Cell Lineage and Atlas, Bioland Laboratory (Guangzhou Regenerative Medicine and Health Guangdong Laboratory), Guangzhou, 510005, China.
- Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, Beijing, 100101, China.
| |
Collapse
|
24
|
Choi J, Chen W, Minkina A, Chardon FM, Suiter CC, Regalado SG, Domcke S, Hamazaki N, Lee C, Martin B, Daza RM, Shendure J. A time-resolved, multi-symbol molecular recorder via sequential genome editing. Nature 2022; 608:98-107. [PMID: 35794474 PMCID: PMC9352581 DOI: 10.1038/s41586-022-04922-8] [Citation(s) in RCA: 92] [Impact Index Per Article: 30.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Accepted: 05/31/2022] [Indexed: 01/07/2023]
Abstract
DNA is naturally well suited to serve as a digital medium for in vivo molecular recording. However, contemporary DNA-based memory devices are constrained in terms of the number of distinct 'symbols' that can be concurrently recorded and/or by a failure to capture the order in which events occur1. Here we describe DNA Typewriter, a general system for in vivo molecular recording that overcomes these and other limitations. For DNA Typewriter, the blank recording medium ('DNA Tape') consists of a tandem array of partial CRISPR-Cas9 target sites, with all but the first site truncated at their 5' ends and therefore inactive. Short insertional edits serve as symbols that record the identity of the prime editing guide RNA2 mediating the edit while also shifting the position of the 'type guide' by one unit along the DNA Tape, that is, sequential genome editing. In this proof of concept of DNA Typewriter, we demonstrate recording and decoding of thousands of symbols, complex event histories and short text messages; evaluate the performance of dozens of orthogonal tapes; and construct 'long tape' potentially capable of recording as many as 20 serial events. Finally, we leverage DNA Typewriter in conjunction with single-cell RNA-seq to reconstruct a monophyletic lineage of 3,257 cells and find that the Poisson-like accumulation of sequential edits to multicopy DNA tape can be maintained across at least 20 generations and 25 days of in vitro clonal expansion.
Collapse
Affiliation(s)
- Junhong Choi
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Howard Hughes Medical Institute, Seattle, WA, USA.
| | - Wei Chen
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Molecular Engineering and Sciences Institute, University of Washington, Seattle, WA, USA
| | - Anna Minkina
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Florence M Chardon
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Chase C Suiter
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
| | - Samuel G Regalado
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Silvia Domcke
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Nobuhiko Hamazaki
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, Seattle, WA, USA
| | - Choli Lee
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Beth Martin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Riza M Daza
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Howard Hughes Medical Institute, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA.
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA.
| |
Collapse
|
25
|
Abstract
DNA tapes could be used to record dynamic molecular and cellular events in animals.
Collapse
Affiliation(s)
- Nanami Masuyama
- School of Biomedical Engineering, Faculty of Applied Science and Faculty of Medicine, The University of British Columbia, Vancouver, BC, Canada.,Institute for Advanced Biosciences, Keio University, Yamagata, Japan
| | - Naoki Konno
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| | - Nozomu Yachie
- School of Biomedical Engineering, Faculty of Applied Science and Faculty of Medicine, The University of British Columbia, Vancouver, BC, Canada.,Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
26
|
Anderson DJ, Pauler FM, McKenna A, Shendure J, Hippenmeyer S, Horwitz MS. Simultaneous brain cell type and lineage determined by scRNA-seq reveals stereotyped cortical development. Cell Syst 2022; 13:438-453.e5. [PMID: 35452605 DOI: 10.1016/j.cels.2022.03.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 01/21/2022] [Accepted: 03/30/2022] [Indexed: 11/30/2022]
Abstract
Mutations are acquired frequently, such that each cell's genome inscribes its history of cell divisions. Common genomic alterations involve loss of heterozygosity (LOH). LOH accumulates throughout the genome, offering large encoding capacity for inferring cell lineage. Using only single-cell RNA sequencing (scRNA-seq) of mouse brain cells, we found that LOH events spanning multiple genes are revealed as tracts of monoallelically expressed, constitutionally heterozygous single-nucleotide variants (SNVs). We simultaneously inferred cell lineage and marked developmental time points based on X chromosome inactivation and the total number of LOH events while identifying cell types from gene expression patterns. Our results are consistent with progenitor cells giving rise to multiple cortical cell types through stereotyped expansion and distinct waves of neurogenesis. This type of retrospective analysis could be incorporated into scRNA-seq pipelines and, compared with experimental approaches for determining lineage in model organisms, is applicable where genetic engineering is prohibited, such as humans.
Collapse
Affiliation(s)
- Donovan J Anderson
- Allen Discovery Center for Lineage Tracing and Department of Laboratory Medicine & Pathology, University of Washington, Seattle, WA 98109, USA
| | - Florian M Pauler
- Institute of Science and Technology Austria, Am Campus 1, 3400 Klosterneuburg, Austria
| | | | - Jay Shendure
- Allen Discovery Center for Lineage Tracing, Department of Genome Sciences, and Howard Hughes Medical Institute, University of Washington, Seattle, WA 98109, USA
| | - Simon Hippenmeyer
- Institute of Science and Technology Austria, Am Campus 1, 3400 Klosterneuburg, Austria
| | - Marshall S Horwitz
- Allen Discovery Center for Lineage Tracing and Department of Laboratory Medicine & Pathology, University of Washington, Seattle, WA 98109, USA.
| |
Collapse
|
27
|
Pan X, Li H, Zhang X. TedSim: temporal dynamics simulation of single-cell RNA sequencing data and cell division history. Nucleic Acids Res 2022; 50:4272-4288. [PMID: 35412632 PMCID: PMC9071466 DOI: 10.1093/nar/gkac235] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 03/23/2022] [Accepted: 03/31/2022] [Indexed: 11/18/2022] Open
Abstract
Recently, lineage tracing technology using CRISPR/Cas9 genome editing has enabled simultaneous readouts of gene expressions and lineage barcodes, which allows for the reconstruction of the cell division tree and makes it possible to reconstruct ancestral cell types and trace the origin of each cell type. Meanwhile, trajectory inference methods are widely used to infer cell trajectories and pseudotime in a dynamic process using gene expression data of present-day cells. Here, we present TedSim (single-cell temporal dynamics simulator), which simulates the cell division events from the root cell to present-day cells, simultaneously generating two data modalities for each single cell: the lineage barcode and gene expression data. TedSim is a framework that connects the two problems: lineage tracing and trajectory inference. Using TedSim, we conducted analysis to show that (i) TedSim generates realistic gene expression and barcode data, as well as realistic relationships between these two data modalities; (ii) trajectory inference methods can recover the underlying cell state transition mechanism with balanced cell type compositions; and (iii) integrating gene expression and barcode data can provide more insights into the temporal dynamics in cell differentiation compared to using only one type of data, but better integration methods need to be developed.
Collapse
Affiliation(s)
- Xinhai Pan
- School of Computational Science and Engineering, College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Hechen Li
- School of Computational Science and Engineering, College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Xiuwei Zhang
- School of Computational Science and Engineering, College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
28
|
Gong W, Kim HJ, Garry DJ, Kwak IY. Single cell lineage reconstruction using distance-based algorithms and the R package, DCLEAR. BMC Bioinformatics 2022; 23:103. [PMID: 35331133 PMCID: PMC8944039 DOI: 10.1186/s12859-022-04633-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 03/13/2022] [Indexed: 11/10/2022] Open
Abstract
Background DCLEAR is an R package used for single cell lineage reconstruction. The advances of CRISPR-based gene editing technologies have enabled the prediction of cell lineage trees based on observed edited barcodes from each cell. However, the performance of existing reconstruction methods of cell lineage trees was not accessed until recently. In response to this problem, the Allen Institute hosted the Cell Lineage Reconstruction Dream Challenge in 2020 to crowdsource relevant knowledge from across the world. Our team won sub-challenges 2 and 3 in the challenge competition. Results The DCLEAR package contained the R codes, which was submitted in response to sub-challenges 2 and 3. Our method consists of two steps: (1) distance matrix estimation and (2) the tree reconstruction from the distance matrix. We proposed two novel methods for distance matrix estimation as outlined in the DCLEAR package. Using our method, we find that two of the more sophisticated distance methods display a substantially improved level of performance compared to the traditional Hamming distance method. DCLEAR is open source and freely available from R CRAN and from under the GNU General Public License, version 3. Conclusions DCLEAR is a powerful resource for single cell lineage reconstruction.
Collapse
Affiliation(s)
- Wuming Gong
- Lillehei Heart Institute, University of Minnesota, Minneapolis, USA
| | - Hyunwoo J Kim
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
| | - Daniel J Garry
- Lillehei Heart Institute, University of Minnesota, Minneapolis, USA
| | - Il-Youp Kwak
- Department of Applied Statistics, Chung-Ang University, Seoul, Republic of Korea.
| |
Collapse
|
29
|
Needham J, Metzis V. Heads or tails: Making the spinal cord. Dev Biol 2022; 485:80-92. [DOI: 10.1016/j.ydbio.2022.03.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Revised: 12/15/2021] [Accepted: 03/02/2022] [Indexed: 12/14/2022]
|
30
|
LINEAGE: Label-free identification of endogenous informative single-cell mitochondrial RNA mutation for lineage analysis. Proc Natl Acad Sci U S A 2022; 119:2119767119. [PMID: 35086932 PMCID: PMC8812554 DOI: 10.1073/pnas.2119767119] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/20/2021] [Indexed: 02/07/2023] Open
Abstract
Lineage analysis is an important assay for developmental biology, cancer biology, etc. Traditional tools in this field are time consuming, technically challenging, and in demand of preexisting knowledge. By integrating exogenous barcodes into cells, single-cell RNA-sequencing (scRNA-seq) can be used to conduct such tasks, but these assays required significant expertise in both wet- and dry-laboratory experiments. We developed a user-friendly algorithm to conduct cell-lineage inference solely based on endogenous markers of label-free scRNA-seq. This algorithm is able to identify lineage-informative mutations from a bunch of interfering mitochondrial RNA variants with high accuracy and efficiency. With this algorithm, we removed most of the technical hurdles of lineage analysis on scRNA-seq and will dramatically accelerate its application in biological research. Single-cell RNA-sequencing (scRNA-seq) has become a powerful tool for biomedical research by providing a variety of valuable information with the advancement of computational tools. Lineage analysis based on scRNA-seq provides key insights into the fate of individual cells in various systems. However, such analysis is limited by several technical challenges. On top of the considerable computational expertise and resources, these analyses also require specific types of matching data such as exogenous barcode information or bulk assay for transposase-accessible chromatin with high throughput sequencing (ATAC-seq) data. To overcome these technical challenges, we developed a user-friendly computational algorithm called “LINEAGE” (label-free identification of endogenous informative single-cell mitochondrial RNA mutation for lineage analysis). Aiming to screen out endogenous markers of lineage located on mitochondrial reads from label-free scRNA-seq data to conduct lineage inference, LINEAGE integrates a marker selection strategy by feature subspace separation and de novo “low cross-entropy subspaces” identification. In this process, the mutation type and subspace–subspace “cross-entropy” of features were both taken into consideration. LINEAGE outperformed three other methods, which were designed for similar tasks as testified with two standard datasets in terms of biological accuracy and computational efficiency. Applied on a label-free scRNA-seq dataset of BRAF-mutated cancer cells, LINEAGE also revealed genes that contribute to BRAF inhibitor resistance. LINEAGE removes most of the technical hurdles of lineage analysis, which will remarkably accelerate the discovery of the important genes or cell-lineage clusters from scRNA-seq data.
Collapse
|