51
|
Lin J, Fukuyama J. Calibrating dimension reduction hyperparameters in the presence of noise. PLoS Comput Biol 2024; 20:e1012427. [PMID: 39264943 PMCID: PMC11421778 DOI: 10.1371/journal.pcbi.1012427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 09/24/2024] [Accepted: 08/19/2024] [Indexed: 09/14/2024] Open
Abstract
The goal of dimension reduction tools is to construct a low-dimensional representation of high-dimensional data. These tools are employed for a variety of reasons such as noise reduction, visualization, and to lower computational costs. However, there is a fundamental issue that is discussed in other modeling problems that is often overlooked in dimension reduction-overfitting. In the context of other modeling problems, techniques such as feature-selection, cross-validation, and regularization are employed to combat overfitting, but rarely are such precautions taken when applying dimension reduction. Prior applications of the two most popular non-linear dimension reduction methods, t-SNE and UMAP, fail to acknowledge data as a combination of signal and noise when assessing performance. These methods are typically calibrated to capture the entirety of the data, not just the signal. In this paper, we demonstrate the importance of acknowledging noise when calibrating hyperparameters and present a framework that enables users to do so. We use this framework to explore the role hyperparameter calibration plays in overfitting the data when applying t-SNE and UMAP. More specifically, we show previously recommended values for perplexity and n_neighbors are too small and overfit the noise. We also provide a workflow others may use to calibrate hyperparameters in the presence of noise.
Collapse
Affiliation(s)
- Justin Lin
- Department of Mathematics, Indiana University, Bloomington, Indiana, United States of America
| | - Julia Fukuyama
- Department of Statistics, Indiana University, Bloomington, Indiana, United States of America
| |
Collapse
|
52
|
Nanduri S, Black A, Bedford T, Huddleston J. Dimensionality reduction distills complex evolutionary relationships in seasonal influenza and SARS-CoV-2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.07.579374. [PMID: 39253501 PMCID: PMC11383015 DOI: 10.1101/2024.02.07.579374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
Public health researchers and practitioners commonly infer phylogenies from viral genome sequences to understand transmission dynamics and identify clusters of genetically-related samples. However, viruses that reassort or recombine violate phylogenetic assumptions and require more sophisticated methods. Even when phylogenies are appropriate, they can be unnecessary or difficult to interpret without specialty knowledge. For example, pairwise distances between sequences can be enough to identify clusters of related samples or assign new samples to existing phylogenetic clusters. In this work, we tested whether dimensionality reduction methods could capture known genetic groups within two human pathogenic viruses that cause substantial human morbidity and mortality and frequently reassort or recombine, respectively: seasonal influenza A/H3N2 and SARS-CoV-2. We applied principal component analysis (PCA), multidimensional scaling (MDS), t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP) to sequences with well-defined phylogenetic clades and either reassortment (H3N2) or recombination (SARS-CoV-2). For each low-dimensional embedding of sequences, we calculated the correlation between pairwise genetic and Euclidean distances in the embedding and applied a hierarchical clustering method to identify clusters in the embedding. We measured the accuracy of clusters compared to previously defined phylogenetic clades, reassortment clusters, or recombinant lineages. We found that MDS embeddings accurately represented pairwise genetic distances including the intermediate placement of recombinant SARS-CoV-2 lineages between parental lineages. Clusters from t-SNE embeddings accurately recapitulated known phylogenetic clades, H3N2 reassortment groups, and SARS-CoV-2 recombinant lineages. We show that simple statistical methods without a biological model can accurately represent known genetic relationships for relevant human pathogenic viruses. Our open source implementation of these methods for analysis of viral genome sequences can be easily applied when phylogenetic methods are either unnecessary or inappropriate.
Collapse
Affiliation(s)
- Sravani Nanduri
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Allison Black
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Trevor Bedford
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Howard Hughes Medical Institute, Seattle, WA, USA
| | - John Huddleston
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| |
Collapse
|
53
|
Shah M, Guo L, Xu X, Deng L, Lu K, Dong J, Zhao C, Xu J. eLIMS: Ensemble Learning-Based Spatial Segmentation of Mass Spectrometry Imaging to Explore Metabolic Heterogeneity. J Proteome Res 2024; 23:3088-3095. [PMID: 38690713 DOI: 10.1021/acs.jproteome.3c00764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2024]
Abstract
Spatial segmentation is an essential processing method for image analysis aiming to identify the characteristic suborgans or microregions from mass spectrometry imaging (MSI) data, which is critical for understanding the spatial heterogeneity of biological information and function and the underlying molecular signatures. Due to the intrinsic characteristics of MSI data including spectral nonlinearity, high-dimensionality, and large data size, the common segmentation methods lack the capability for capturing the accurate microregions associated with biological functions. Here we proposed an ensemble learning-based spatial segmentation strategy, named eLIMS, that combines a randomized unified manifold approximation and projection (r-UMAP) dimensionality reduction module for extracting significant features and an ensemble pixel clustering module for aggregating the clustering maps from r-UMAP. Three MSI datasets are used to evaluate the performance of eLIMS, including mouse fetus, human adenocarcinoma, and mouse brain. Experimental results demonstrate that the proposed method has potential in partitioning the heterogeneous tissues into several subregions associated with anatomical structure, i.e., the suborgans of the brain region in mouse fetus data are identified as dorsal pallium, midbrain, and brainstem. Furthermore, it effectively discovers critical microregions related to physiological and pathological variations offering new insight into metabolic heterogeneity.
Collapse
Affiliation(s)
- Mudassir Shah
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, Xiamen 361005, China
| | - Lei Guo
- Interdisciplinary Institute of Medical Engineering, Fuzhou University, Fuzhou 350108, China
| | - Xiangnan Xu
- School of Business and Economics, Humboldt-Universität zu Berlin, Berlin 10099, Germany
| | - Lingli Deng
- Department of Information Engineering, East China University of Technology, Nanchang 330013, China
| | - Keyi Lu
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, Xiamen 361005, China
| | - Jiyang Dong
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, Xiamen 361005, China
| | - Chao Zhao
- Bionic Sensing and Intelligence Center, Institute of Biomedical and Health Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China
| | - Jingjing Xu
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, Xiamen 361005, China
| |
Collapse
|
54
|
Arevalo J, Su E, Ewald JD, van Dijk R, Carpenter AE, Singh S. Evaluating batch correction methods for image-based cell profiling. Nat Commun 2024; 15:6516. [PMID: 39095341 PMCID: PMC11297288 DOI: 10.1038/s41467-024-50613-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 07/13/2024] [Indexed: 08/04/2024] Open
Abstract
High-throughput image-based profiling platforms are powerful technologies capable of collecting data from billions of cells exposed to thousands of perturbations in a time- and cost-effective manner. Therefore, image-based profiling data has been increasingly used for diverse biological applications, such as predicting drug mechanism of action or gene function. However, batch effects severely limit community-wide efforts to integrate and interpret image-based profiling data collected across different laboratories and equipment. To address this problem, we benchmark ten high-performing single-cell RNA sequencing (scRNA-seq) batch correction techniques, representing diverse approaches, using a newly released Cell Painting dataset, JUMP. We focus on five scenarios with varying complexity, ranging from batches prepared in a single lab over time to batches imaged using different microscopes in multiple labs. We find that Harmony and Seurat RPCA are noteworthy, consistently ranking among the top three methods for all tested scenarios while maintaining computational efficiency. Our proposed framework, benchmark, and metrics can be used to assess new batch correction methods in the future. This work paves the way for improvements that enable the community to make the best use of public Cell Painting data for scientific discovery.
Collapse
Affiliation(s)
- John Arevalo
- Imaging Platform, Broad Institute of MIT and Harvard, 02142, Cambridge, MA, USA
| | - Ellen Su
- Imaging Platform, Broad Institute of MIT and Harvard, 02142, Cambridge, MA, USA
| | - Jessica D Ewald
- Imaging Platform, Broad Institute of MIT and Harvard, 02142, Cambridge, MA, USA
| | - Robert van Dijk
- Imaging Platform, Broad Institute of MIT and Harvard, 02142, Cambridge, MA, USA
| | - Anne E Carpenter
- Imaging Platform, Broad Institute of MIT and Harvard, 02142, Cambridge, MA, USA
| | - Shantanu Singh
- Imaging Platform, Broad Institute of MIT and Harvard, 02142, Cambridge, MA, USA.
| |
Collapse
|
55
|
Lobentanzer S, Rodriguez-Mier P, Bauer S, Saez-Rodriguez J. Molecular causality in the advent of foundation models. Mol Syst Biol 2024; 20:848-858. [PMID: 38890548 PMCID: PMC11297329 DOI: 10.1038/s44320-024-00041-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 03/18/2024] [Accepted: 03/21/2024] [Indexed: 06/20/2024] Open
Abstract
Correlation is not causation: this simple and uncontroversial statement has far-reaching implications. Defining and applying causality in biomedical research has posed significant challenges to the scientific community. In this perspective, we attempt to connect the partly disparate fields of systems biology, causal reasoning, and machine learning to inform future approaches in the field of systems biology and molecular medicine.
Collapse
Affiliation(s)
- Sebastian Lobentanzer
- Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany.
| | - Pablo Rodriguez-Mier
- Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany
| | | | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany.
| |
Collapse
|
56
|
Fan JL, Nazaret A, Azizi E. A thousand and one tumors: the promise of AI for cancer biology. Nat Methods 2024; 21:1403-1406. [PMID: 39122940 DOI: 10.1038/s41592-024-02364-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2024]
Affiliation(s)
- Joy Linyue Fan
- Department of Biomedical Engineering, Columbia University, New York, NY, USA
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY, USA
| | - Achille Nazaret
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY, USA
- Department of Computer Science, Columbia University, New York, NY, USA
| | - Elham Azizi
- Department of Biomedical Engineering, Columbia University, New York, NY, USA.
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY, USA.
- Department of Computer Science, Columbia University, New York, NY, USA.
- Data Science Institute, Columbia University, New York, NY, USA.
- Herbert Irving Comprehensive Cancer Center, Columbia University, New York, NY, USA.
| |
Collapse
|
57
|
Lause J, Kobak D, Berens P. The art of seeing the elephant in the room: 2D embeddings of single-cell data do make sense. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.26.586728. [PMID: 38585748 PMCID: PMC10996625 DOI: 10.1101/2024.03.26.586728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
A recent paper in PLOS Computational Biology (Chari and Pachter, 2023) claimed that t -SNE and UMAP embeddings of single-cell datasets fail to capture true biological structure. The authors argued that such embeddings are as arbitrary and as misleading as forcing the data into an elephant shape. Here we show that this conclusion was based on inadequate and limited metrics of embedding quality. More appropriate metrics quantifying neighborhood and class preservation reveal the elephant in the room: while t -SNE and UMAP embeddings of single-cell data do not preserve high-dimensional distances, they can nevertheless provide biologically relevant information.
Collapse
|
58
|
Cess CG, Haghverdi L. Compound-SNE: Comparative alignment of t-SNEs for multiple single-cell omics data visualisation. Bioinformatics 2024; 40:btae471. [PMID: 39052868 PMCID: PMC11290359 DOI: 10.1093/bioinformatics/btae471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 07/10/2024] [Accepted: 07/24/2024] [Indexed: 07/27/2024] Open
Abstract
SUMMARY One of the first steps in single-cell omics data analysis is visualization, which allows researchers to see how well-separated cell-types are from each other. When visualizing multiple datasets at once, data integration/batch correction methods are used to merge the datasets. While needed for downstream analyses, these methods modify features space (e.g. gene expression)/PCA space in order to mix cell-types between batches as well as possible. This obscures sample-specific features and breaks down local embedding structures that can be seen when a sample is embedded alone. Therefore, in order to improve in visual comparisons between large numbers of samples (e.g., multiple patients, omic modalities, different time points), we introduce Compound-SNE, which performs what we term a soft alignment of samples in embedding space. We show that Compound-SNE is able to align cell-types in embedding space across samples, while preserving local embedding structures from when samples are embedded independently. AVAILABILITY AND IMPLEMENTATION Python code for Compound-SNE is available for download at https://github.com/HaghverdiLab/Compound-SNE. SUPPLEMENTARY INFORMATION Available online. Provides algorithmic details and additional tests.
Collapse
Affiliation(s)
- Colin G Cess
- Berlin Institute for Medical Systems Biology, Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association(BIMSB-MDC), Berlin 10115, Germany
| | - Laleh Haghverdi
- Berlin Institute for Medical Systems Biology, Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association(BIMSB-MDC), Berlin 10115, Germany
| |
Collapse
|
59
|
Sun ED, Zhou OY, Hauptschein M, Rappoport N, Xu L, Navarro Negredo P, Liu L, Rando TA, Zou J, Brunet A. Spatiotemporal transcriptomic profiling and modeling of mouse brain at single-cell resolution reveals cell proximity effects of aging and rejuvenation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.16.603809. [PMID: 39071282 PMCID: PMC11275735 DOI: 10.1101/2024.07.16.603809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Old age is associated with a decline in cognitive function and an increase in neurodegenerative disease risk1. Brain aging is complex and accompanied by many cellular changes2-20. However, the influence that aged cells have on neighboring cells and how this contributes to tissue decline is unknown. More generally, the tools to systematically address this question in aging tissues have not yet been developed. Here, we generate spatiotemporal data at single-cell resolution for the mouse brain across lifespan, and we develop the first machine learning models based on spatial transcriptomics ('spatial aging clocks') to reveal cell proximity effects during brain aging and rejuvenation. We collect a single-cell spatial transcriptomics brain atlas of 4.2 million cells from 20 distinct ages and across two rejuvenating interventions-exercise and partial reprogramming. We identify spatial and cell type-specific transcriptomic fingerprints of aging, rejuvenation, and disease, including for rare cell types. Using spatial aging clocks and deep learning models, we find that T cells, which infiltrate the brain with age, have a striking pro-aging proximity effect on neighboring cells. Surprisingly, neural stem cells have a strong pro-rejuvenating effect on neighboring cells. By developing computational tools to identify mediators of these proximity effects, we find that pro-aging T cells trigger a local inflammatory response likely via interferon-γ whereas pro-rejuvenating neural stem cells impact the metabolism of neighboring cells possibly via growth factors (e.g. vascular endothelial growth factor) and extracellular vesicles, and we experimentally validate some of these predictions. These results suggest that rare cells can have a drastic influence on their neighbors and could be targeted to counter tissue aging. We anticipate that these spatial aging clocks will not only allow scalable assessment of the efficacy of interventions for aging and disease but also represent a new tool for studying cell-cell interactions in many spatial contexts.
Collapse
Affiliation(s)
- Eric D. Sun
- Department of Biomedical Data Science, Stanford University, CA, USA
- Department of Genetics, Stanford University, CA, USA
| | - Olivia Y. Zhou
- Department of Genetics, Stanford University, CA, USA
- Stanford Biophysics Program, Stanford University, CA, USA
- Stanford Medical Scientist Training Program, Stanford University, CA, USA
| | | | | | - Lucy Xu
- Department of Genetics, Stanford University, CA, USA
- Department of Biology, Stanford University, CA, USA
| | | | - Ling Liu
- Department of Neurology, Stanford University, CA, USA
- Department of Neurology, UCLA, Los Angeles, CA, USA
- Eli and Edythe Broad Center for Regenerative Medicine and Stem Cell Biology, UCLA, Los Angeles, CA, USA
| | - Thomas A. Rando
- Department of Neurology, Stanford University, CA, USA
- Department of Neurology, UCLA, Los Angeles, CA, USA
- Eli and Edythe Broad Center for Regenerative Medicine and Stem Cell Biology, UCLA, Los Angeles, CA, USA
| | - James Zou
- Department of Biomedical Data Science, Stanford University, CA, USA
- These authors contributed equally: James Zou, Anne Brunet
| | - Anne Brunet
- Department of Genetics, Stanford University, CA, USA
- Glenn Center for the Biology of Aging, Stanford University, CA, USA
- Wu Tsai Neurosciences Institute, Stanford University, CA, USA
- These authors contributed equally: James Zou, Anne Brunet
| |
Collapse
|
60
|
Cristian PM, Aarón VJ, Armando EHD, Estrella MLY, Daniel NR, David GV, Edgar M, Paul SCJ, Osbaldo RA. Diffusion on PCA-UMAP Manifold: The Impact of Data Structure Preservation to Denoise High-Dimensional Single-Cell RNA Sequencing Data. BIOLOGY 2024; 13:512. [PMID: 39056705 PMCID: PMC11274112 DOI: 10.3390/biology13070512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Accepted: 06/25/2024] [Indexed: 07/28/2024]
Abstract
Single-cell transcriptomics (scRNA-seq) is revolutionizing biological research, yet it faces challenges such as inefficient transcript capture and noise. To address these challenges, methods like neighbor averaging or graph diffusion are used. These methods often rely on k-nearest neighbor graphs from low-dimensional manifolds. However, scRNA-seq data suffer from the 'curse of dimensionality', leading to the over-smoothing of data when using imputation methods. To overcome this, sc-PHENIX employs a PCA-UMAP diffusion method, which enhances the preservation of data structures and allows for a refined use of PCA dimensions and diffusion parameters (e.g., k-nearest neighbors, exponentiation of the Markov matrix) to minimize noise introduction. This approach enables a more accurate construction of the exponentiated Markov matrix (cell neighborhood graph), surpassing methods like MAGIC. sc-PHENIX significantly mitigates over-smoothing, as validated through various scRNA-seq datasets, demonstrating improved cell phenotype representation. Applied to a multicellular tumor spheroid dataset, sc-PHENIX identified known extreme phenotype states, showcasing its effectiveness. sc-PHENIX is open-source and available for use and modification.
Collapse
Affiliation(s)
- Padron-Manrique Cristian
- Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN), Periferico Sur 4809, Arenal Tepepan, Tlalpan, Mexico City 14610, Mexico; (P.-M.C.); (V.-J.A.); (E.-H.D.A.); (N.-R.D.); (G.-V.D.); (M.E.)
- Programa de Doctorado en Ciencias Biomédicas, Circuito Posgrados, Ciudad Universitaria, Alcaldía Coyoacán Unidad de Posgrado Edificio B primer Piso, Universidad Nacional Autónoma de México (UNAM), Mexico City 04510, Mexico
| | - Vázquez-Jiménez Aarón
- Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN), Periferico Sur 4809, Arenal Tepepan, Tlalpan, Mexico City 14610, Mexico; (P.-M.C.); (V.-J.A.); (E.-H.D.A.); (N.-R.D.); (G.-V.D.); (M.E.)
| | - Esquivel-Hernandez Diego Armando
- Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN), Periferico Sur 4809, Arenal Tepepan, Tlalpan, Mexico City 14610, Mexico; (P.-M.C.); (V.-J.A.); (E.-H.D.A.); (N.-R.D.); (G.-V.D.); (M.E.)
| | - Martinez-Lopez Yoscelina Estrella
- Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN), Periferico Sur 4809, Arenal Tepepan, Tlalpan, Mexico City 14610, Mexico; (P.-M.C.); (V.-J.A.); (E.-H.D.A.); (N.-R.D.); (G.-V.D.); (M.E.)
- Programa de Doctorado en Ciencias Médicas, Odontológicas y de la Salud, Unidad de Posgrado, Edificio A, 1er Piso, Circuito Posgrados, Ciudad Universitaria, Alcaldía Coyoacán, Universidad Nacional Autónoma de México (UNAM), Mexico City 04510, Mexico
| | - Neri-Rosario Daniel
- Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN), Periferico Sur 4809, Arenal Tepepan, Tlalpan, Mexico City 14610, Mexico; (P.-M.C.); (V.-J.A.); (E.-H.D.A.); (N.-R.D.); (G.-V.D.); (M.E.)
- Programa de Maestría en Ciencias Bioquímicas, Unidad de Posgrado, Edificio B, 1er Piso, Circuito de los Posgrados, Ciudad Universitaria, Universidad Nacional Autónoma de México (UNAM), Alcaldía Coyoacán, Ciudad de México 04510, Mexico
| | - Giron-Villalobos David
- Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN), Periferico Sur 4809, Arenal Tepepan, Tlalpan, Mexico City 14610, Mexico; (P.-M.C.); (V.-J.A.); (E.-H.D.A.); (N.-R.D.); (G.-V.D.); (M.E.)
- Programa de Maestría en Ciencias Bioquímicas, Unidad de Posgrado, Edificio B, 1er Piso, Circuito de los Posgrados, Ciudad Universitaria, Universidad Nacional Autónoma de México (UNAM), Alcaldía Coyoacán, Ciudad de México 04510, Mexico
| | - Mixcoha Edgar
- Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN), Periferico Sur 4809, Arenal Tepepan, Tlalpan, Mexico City 14610, Mexico; (P.-M.C.); (V.-J.A.); (E.-H.D.A.); (N.-R.D.); (G.-V.D.); (M.E.)
- CONAHCYT-INMEGEN, Periferico Sur 4809, Arenal Tepepan, Tlalpan, Mexico City 14610, Mexico
| | - Sánchez-Castañeda Jean Paul
- Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN), Periferico Sur 4809, Arenal Tepepan, Tlalpan, Mexico City 14610, Mexico; (P.-M.C.); (V.-J.A.); (E.-H.D.A.); (N.-R.D.); (G.-V.D.); (M.E.)
- Programa de Maestría en Ciencias Bioquímicas, Unidad de Posgrado, Edificio B, 1er Piso, Circuito de los Posgrados, Ciudad Universitaria, Universidad Nacional Autónoma de México (UNAM), Alcaldía Coyoacán, Ciudad de México 04510, Mexico
| | - Resendis-Antonio Osbaldo
- Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN), Periferico Sur 4809, Arenal Tepepan, Tlalpan, Mexico City 14610, Mexico; (P.-M.C.); (V.-J.A.); (E.-H.D.A.); (N.-R.D.); (G.-V.D.); (M.E.)
- Coordinación de la Investigación Científica-Red de Apoyo a la Investigación, Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán, Vasco de Quiroga, 14, Belisario Dominguez Sección XVI, Tlalpan, Mexico City 14080, Mexico
- Centro de Ciencias de la Complejidad, Unversidad Nacional Autónoma de México (UNAM), Circuito Centro Cultural, Coyoacán, Mexico City 04510, Mexico
| |
Collapse
|
61
|
Arango AS, Park H, Tajkhorshid E. Topological Learning Approach to Characterizing Biological Membranes. J Chem Inf Model 2024; 64:5242-5252. [PMID: 38912752 PMCID: PMC12009557 DOI: 10.1021/acs.jcim.4c00552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/25/2024]
Abstract
Biological membranes play key roles in cellular compartmentalization, structure, and its signaling pathways. At varying temperatures, individual membrane lipids sample from different configurations, a process that frequently leads to higher-order phase behavior and phenomena. Here, we present a persistent homology (PH)-based method for quantifying the structural features of individual and bulk lipids, providing local and contextual information on lipid tail organization. Our method leverages the mathematical machinery of algebraic topology and machine learning to infer temperature-dependent structural information on lipids from static coordinates. To train our model, we generated multiple molecular dynamics trajectories of dipalmitoyl-phosphatidylcholine membranes at varying temperatures. A fingerprint was then constructed for each set of lipid coordinates by PH filtration, in which interaction spheres were grown around the lipid atoms while tracking their intersections. The sphere filtration formed a simplicial complex that captures enduring key topological features of the configuration landscape using homology, yielding persistence data. Following fingerprint extraction for physiologically relevant temperatures, the persistence data were used to train an attention-based neural network for assignment of effective temperature values to selected membrane regions. Our persistence homology-based method captures the local structural effects, via effective temperature, of lipids adjacent to other membrane constituents, e.g., sterols and proteins. This topological learning approach can predict lipid effective temperatures from static coordinates across multiple spatial resolutions. The tool, called MembTDA, can be accessed at https://github.com/hyunp2/Memb-TDA.
Collapse
Affiliation(s)
- Andres S Arango
- Theoretical and Computational Biophysics Group, NIH Resource Center for Macromolecular Modeling and Visualization, Beckman Institute for Advanced Science and Technology, Department of Biochemistry, and Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Hyun Park
- Theoretical and Computational Biophysics Group, NIH Resource Center for Macromolecular Modeling and Visualization, Beckman Institute for Advanced Science and Technology, Department of Biochemistry, and Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Emad Tajkhorshid
- Theoretical and Computational Biophysics Group, NIH Resource Center for Macromolecular Modeling and Visualization, Beckman Institute for Advanced Science and Technology, Department of Biochemistry, and Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
62
|
Chari T, Gorin G, Pachter L. Stochastic Modeling of Biophysical Responses to Perturbation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.04.602131. [PMID: 39005347 PMCID: PMC11245117 DOI: 10.1101/2024.07.04.602131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Recent advances in high-throughput, multi-condition experiments allow for genome-wide investigation of how perturbations affect transcription and translation in the cell across multiple biological entities or modalities, from chromatin and mRNA information to protein production and spatial morphology. This presents an unprecedented opportunity to unravel how the processes of DNA and RNA regulation direct cell fate determination and disease response. Most methods designed for analyzing large-scale perturbation data focus on the observational outcomes, e.g., expression; however, many potential transcriptional mechanisms, such as transcriptional bursting or splicing dynamics, can underlie these complex and noisy observations. In this analysis, we demonstrate how a stochastic biophysical modeling approach to interpreting high-throughout perturbation data enables deeper investigation of the 'how' behind such molecular measurements. Our approach takes advantage of modalities already present in data produced with current technologies, such as nascent and mature mRNA measurements, to illuminate transcriptional dynamics induced by perturbation, predict kinetic behaviors in new perturbation settings, and uncover novel populations of cells with distinct kinetic responses to perturbation.
Collapse
Affiliation(s)
- Tara Chari
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | | | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California
| |
Collapse
|
63
|
Chen AA, Clark K, Dewey BE, DuVal A, Pellegrini N, Nair G, Jalkh Y, Khalil S, Zurawski J, Calabresi PA, Reich DS, Bakshi R, Shou H, Shinohara RT, Alzheimer’s Disease Neuroimaging Initiative, and North American Imaging in Multiple Sclerosis Cooperative. PARE: A framework for removal of confounding effects from any distance-based dimension reduction method. PLoS Comput Biol 2024; 20:e1012241. [PMID: 38985831 PMCID: PMC11262650 DOI: 10.1371/journal.pcbi.1012241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 07/22/2024] [Accepted: 06/10/2024] [Indexed: 07/12/2024] Open
Abstract
Dimension reduction tools preserving similarity and graph structure such as t-SNE and UMAP can capture complex biological patterns in high-dimensional data. However, these tools typically are not designed to separate effects of interest from unwanted effects due to confounders. We introduce the partial embedding (PARE) framework, which enables removal of confounders from any distance-based dimension reduction method. We then develop partial t-SNE and partial UMAP and apply these methods to genomic and neuroimaging data. For lower-dimensional visualization, our results show that the PARE framework can remove batch effects in single-cell sequencing data as well as separate clinical and technical variability in neuroimaging measures. We demonstrate that the PARE framework extends dimension reduction methods to highlight biological patterns of interest while effectively removing confounding effects.
Collapse
Affiliation(s)
- Andrew A. Chen
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, South Carolina, United States of America
| | - Kelly Clark
- Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Blake E. Dewey
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Anna DuVal
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Nicole Pellegrini
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Govind Nair
- Translational Neuroradiology Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Youmna Jalkh
- Department of Neurology, Brigham and Women’s Hospital, Harvard Medical School, Boston, Masschusetts, United States of America
| | - Samar Khalil
- Department of Neurology, Brigham and Women’s Hospital, Harvard Medical School, Boston, Masschusetts, United States of America
| | - Jon Zurawski
- Department of Neurology, Brigham and Women’s Hospital, Harvard Medical School, Boston, Masschusetts, United States of America
| | - Peter A. Calabresi
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Daniel S. Reich
- Translational Neuroradiology Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Rohit Bakshi
- Department of Neurology, Brigham and Women’s Hospital, Harvard Medical School, Boston, Masschusetts, United States of America
- Department of Radiology, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Haochang Shou
- Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Russell T. Shinohara
- Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | | |
Collapse
|
64
|
Vardaman D, Ali MA, Bolding C, Tidwell H, Stephens H, Tyrrell DJ. Development of a Spectral Flow Cytometry Analysis Pipeline for High-Dimensional Immune Cell Characterization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.19.599633. [PMID: 38948780 PMCID: PMC11213029 DOI: 10.1101/2024.06.19.599633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Flow cytometry is a widely used technique for immune cell analysis, offering insights into cell composition and function. Spectral flow cytometry allows for high-dimensional analysis of immune cells, overcoming limitations of conventional flow cytometry. However, analyzing data from large antibody panels can be challenging using traditional bi-axial gating strategies. Here, we present a novel analysis pipeline designed to improve analysis of spectral flow cytometry. We employ this method to identify rare T cell populations in aging. We isolated splenocytes from young (2-3 months) and aged (18-19 months) female mice then stained these with a panel of 20 fluorescently labeled antibodies. Spectral flow cytometry was performed, followed by data processing and analysis using Python within a Jupyter Notebook environment to perform batch correction, unsupervised clustering, dimensionality reduction, and differential expression analysis. Our analysis of 3,776,804 T cells from 11 spleens revealed 34 distinct T cell clusters identified by surface marker expression. We observed significant differences between young and aged mice, with certain clusters enriched in one age group over the other. Naïve, effector memory, and central memory CD8+ and CD4+ T cell subsets exhibited age-associated changes in abundance and marker expression. Additionally, γδ T cell clusters showed differential abundance between age groups. By leveraging high-dimensional analysis methods borrowed from single-cell RNA sequencing analysis, we identified age-related differences in T cell subsets, providing insights into the immune aging process. This approach offers a robust, free, and easily implemented analysis pipeline for spectral flow cytometry data that may facilitate the discovery of novel therapeutic targets for age-related immune dysfunction.
Collapse
Affiliation(s)
- Donald Vardaman
- Department of Pathology, University of Alabama at Birmingham, Birmingham, AL, 35205 USA
| | - Md Akkas Ali
- Department of Pathology, University of Alabama at Birmingham, Birmingham, AL, 35205 USA
- Biochemistry and Structural Biology Theme, Graduate Biomedical Sciences, University of Alabama at Birmingham, Birmingham, AL, 35205 USA
| | - Chase Bolding
- Department of Pathology, University of Alabama at Birmingham, Birmingham, AL, 35205 USA
| | - Harrison Tidwell
- Department of Pathology, University of Alabama at Birmingham, Birmingham, AL, 35205 USA
| | - Holly Stephens
- Department of Nutrition Sciences, University of Alabama at Birmingham, Birmingham, AL, 35205 USA
- Immunology Theme, Graduate Biomedical Sciences, University of Alabama at Birmingham, Birmingham, AL, 35205 USA
| | - Daniel J. Tyrrell
- Department of Pathology, University of Alabama at Birmingham, Birmingham, AL, 35205 USA
| |
Collapse
|
65
|
Jia Y, Ma P, Yao Q. CellMarkerPipe: cell marker identification and evaluation pipeline in single cell transcriptomes. Sci Rep 2024; 14:13151. [PMID: 38849445 PMCID: PMC11161599 DOI: 10.1038/s41598-024-63492-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 05/29/2024] [Indexed: 06/09/2024] Open
Abstract
Assessing marker genes from all cell clusters can be time-consuming and lack systematic strategy. Streamlining this process through a unified computational platform that automates identification and benchmarking will greatly enhance efficiency and ensure a fair evaluation. We therefore developed a novel computational platform, cellMarkerPipe ( https://github.com/yao-laboratory/cellMarkerPipe ), for automated cell-type specific marker gene identification from scRNA-seq data, coupled with comprehensive evaluation schema. CellMarkerPipe adaptively wraps around a collection of commonly used and state-of-the-art tools, including Seurat, COSG, SC3, SCMarker, COMET, and scGeneFit. From rigorously testing across diverse samples, we ascertain SCMarker's overall reliable performance in single marker gene selection, with COSG showing commendable speed and comparable efficacy. Furthermore, we demonstrate the pivotal role of our approach in real-world medical datasets. This general and opensource pipeline stands as a significant advancement in streamlining cell marker gene identification and evaluation, fitting broad applications in the field of cellular biology and medical research.
Collapse
Affiliation(s)
- Yinglu Jia
- School of Computing, University of Nebraska Lincoln, 256 Avery Hall, Lincoln, NE, 68588, USA
- Department of Chemistry, University of Nebraska Lincoln, Hamilton Hall, Lincoln, NE, 68588, USA
| | - Pengchong Ma
- School of Computing, University of Nebraska Lincoln, 256 Avery Hall, Lincoln, NE, 68588, USA
| | - Qiuming Yao
- School of Computing, University of Nebraska Lincoln, 256 Avery Hall, Lincoln, NE, 68588, USA.
- Nebraska Center for the Prevention of Obesity Diseases, 316C Leverton Hall, Lincoln, NE, 68583, USA.
- Nebraska Center for Virology, University of Nebraska, 4240 Fair St., Lincoln, NE, 68583, USA.
| |
Collapse
|
66
|
Sampson MM, Morgan RK, Sloan SA, Bakulski KM. Single-cell investigation of lead toxicity from neurodevelopment to neurodegeneration: Current review and future opportunities. CURRENT OPINION IN TOXICOLOGY 2024; 38:100464. [PMID: 39086983 PMCID: PMC11290315 DOI: 10.1016/j.cotox.2024.100464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/02/2024]
Abstract
Human exposure to the metal lead (Pb) is prevalent and associated with adverse neurodevelopmental and neurodegenerative outcomes. Pb disrupts normal brain function by inducing oxidative stress and neuroinflammation, altering cellular metabolism, and displacing essential metals. Prior studies on the molecular impacts of Pb have examined bulk tissues, which collapse information across all cell types, or in targeted cells, which are limited to cell autonomous effects. These approaches are unable to represent the complete biological implications of Pb exposure because the brain is a cooperative network of highly heterogeneous cells, with cellular diversity and proportions shifting throughout development, by brain region, and with disease. New technologies are necessary to investigate whether Pb and other environmental exposures alter cell composition in the brain and whether they cause molecular changes in a cell-type-specific manner. Cutting-edge, single-cell approaches now enable research resolving cell-type-specific effects from bulk tissues. This article reviews existing Pb neurotoxicology studies with genome-wide molecular signatures and provides a path forward for the field to implement single-cell approaches with practical recommendations.
Collapse
Affiliation(s)
- Maureen M Sampson
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA, USA
| | - Rachel K Morgan
- Department of Environmental Health Sciences, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Steven A Sloan
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA, USA
| | - Kelly M Bakulski
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
67
|
Marx V. Seeing data as t-SNE and UMAP do. Nat Methods 2024; 21:930-933. [PMID: 38789649 DOI: 10.1038/s41592-024-02301-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2024]
|
68
|
Taylor MA, Kandyba E, Halliwill K, Delrosario R, Khoroshkin M, Goodarzi H, Quigley D, Li YR, Wu D, Bollam SR, Mirzoeva OK, Akhurst RJ, Balmain A. Stem-cell states converge in multistage cutaneous squamous cell carcinoma development. Science 2024; 384:eadi7453. [PMID: 38815020 DOI: 10.1126/science.adi7453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 04/05/2024] [Indexed: 06/01/2024]
Abstract
Stem cells play a critical role in cancer development by contributing to cell heterogeneity, lineage plasticity, and drug resistance. We created gene expression networks from hundreds of mouse tissue samples (both normal and tumor) and integrated these with lineage tracing and single-cell RNA-seq, to identify convergence of cell states in premalignant tumor cells expressing markers of lineage plasticity and drug resistance. Two of these cell states representing multilineage plasticity or proliferation were inversely correlated, suggesting a mutually exclusive relationship. Treatment of carcinomas in vivo with chemotherapy repressed the proliferative state and activated multilineage plasticity whereas inhibition of differentiation repressed plasticity and potentiated responses to cell cycle inhibitors. Manipulation of this cell state transition point may provide a source of potential combinatorial targets for cancer therapy.
Collapse
Affiliation(s)
- Mark A Taylor
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
- Clinical Research Centre, Medical University of Bialystok, Bialystok 15-089, Poland
| | - Eve Kandyba
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
| | - Kyle Halliwill
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
- AbbVie, South San Francisco, CA 94080, USA
| | - Reyno Delrosario
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
| | - Matvei Khoroshkin
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
| | - Hani Goodarzi
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, CA 94518, USA
- Department of Urology, University of California San Francisco, San Francisco, CA 94518, USA
- Arc Institute, Palo Alto, CA 94304, USA
| | - David Quigley
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Urology, University of California San Francisco, San Francisco, CA 94518, USA
- Department of Epidemiology & Biostatistics, University of California San Francisco, San Francisco, CA 94518, USA
| | - Yun Rose Li
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Radiation Oncology, City of Hope National Medical Center, Duarte, CA 91010, USA
- Department of Cancer Genetics & Epigenetics, City of Hope National Medical Center, Duarte, CA 91010, USA
- Division of Quantitative Medicine & Systems Biology, Translational Genomics Research Institute, Phoenix, CA 85004, USA
| | - Di Wu
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
| | - Saumya R Bollam
- Biomedical Sciences Graduate Program, University of California San Francisco, San Francisco, CA 94518, USA
| | - Olga K Mirzoeva
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
| | - Rosemary J Akhurst
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Anatomy, University of California San Francisco, San Francisco, CA 94518, USA
| | - Allan Balmain
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, CA 94518, USA
| |
Collapse
|
69
|
Rafelski SM, Theriot JA. Establishing a conceptual framework for holistic cell states and state transitions. Cell 2024; 187:2633-2651. [PMID: 38788687 DOI: 10.1016/j.cell.2024.04.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 04/10/2024] [Accepted: 04/24/2024] [Indexed: 05/26/2024]
Abstract
Cell states were traditionally defined by how they looked, where they were located, and what functions they performed. In this post-genomic era, the field is largely focused on a molecular view of cell state. Moving forward, we anticipate that the observables used to define cell states will evolve again as single-cell imaging and analytics are advancing at a breakneck pace via the collection of large-scale, systematic cell image datasets and the application of quantitative image-based data science methods. This is, therefore, a key moment in the arc of cell biological research to develop approaches that integrate the spatiotemporal observables of the physical structure and organization of the cell with molecular observables toward the concept of a holistic cell state. In this perspective, we propose a conceptual framework for holistic cell states and state transitions that is data-driven, practical, and useful to enable integrative analyses and modeling across many data types.
Collapse
Affiliation(s)
- Susanne M Rafelski
- Allen Institute for Cell Science, 615 Westlake Avenue N, Seattle, WA 98125, USA.
| | - Julie A Theriot
- Department of Biology and Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
70
|
Mulè MP, Martins AJ, Cheung F, Farmer R, Sellers BA, Quiel JA, Jain A, Kotliarov Y, Bansal N, Chen J, Schwartzberg PL, Tsang JS. Integrating population and single-cell variations in vaccine responses identifies a naturally adjuvanted human immune setpoint. Immunity 2024; 57:1160-1176.e7. [PMID: 38697118 DOI: 10.1016/j.immuni.2024.04.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 01/21/2024] [Accepted: 04/12/2024] [Indexed: 05/04/2024]
Abstract
Multimodal single-cell profiling methods can capture immune cell variations unfolding over time at the molecular, cellular, and population levels. Transforming these data into biological insights remains challenging. Here, we introduce a framework to integrate variations at the human population and single-cell levels in vaccination responses. Comparing responses following AS03-adjuvanted versus unadjuvanted influenza vaccines with CITE-seq revealed AS03-specific early (day 1) response phenotypes, including a B cell signature of elevated germinal center competition. A correlated network of cell-type-specific transcriptional states defined the baseline immune status associated with high antibody responders to the unadjuvanted vaccine. Certain innate subsets in the network appeared "naturally adjuvanted," with transcriptional states resembling those induced uniquely by AS03-adjuvanted vaccination. Consistently, CD14+ monocytes from high responders at baseline had elevated phospho-signaling responses to lipopolysaccharide stimulation. Our findings link baseline immune setpoints to early vaccine responses, with positive implications for adjuvant development and immune response engineering.
Collapse
Affiliation(s)
- Matthew P Mulè
- Multiscale Systems Biology Section, Laboratory of Immune System Biology, NIAID, NIH, Bethesda, MD, USA; NIH-Oxford-Cambridge Scholars Program, Department of Medicine, University of Cambridge, Cambridge, UK
| | - Andrew J Martins
- Multiscale Systems Biology Section, Laboratory of Immune System Biology, NIAID, NIH, Bethesda, MD, USA
| | - Foo Cheung
- NIH Center for Human Immunology, NIAID, NIH, Bethesda, MD, USA
| | - Rohit Farmer
- NIH Center for Human Immunology, NIAID, NIH, Bethesda, MD, USA
| | - Brian A Sellers
- NIH Center for Human Immunology, NIAID, NIH, Bethesda, MD, USA
| | - Juan A Quiel
- NIH Center for Human Immunology, NIAID, NIH, Bethesda, MD, USA
| | - Arjun Jain
- Multiscale Systems Biology Section, Laboratory of Immune System Biology, NIAID, NIH, Bethesda, MD, USA
| | - Yuri Kotliarov
- NIH Center for Human Immunology, NIAID, NIH, Bethesda, MD, USA
| | - Neha Bansal
- Multiscale Systems Biology Section, Laboratory of Immune System Biology, NIAID, NIH, Bethesda, MD, USA
| | - Jinguo Chen
- NIH Center for Human Immunology, NIAID, NIH, Bethesda, MD, USA
| | - Pamela L Schwartzberg
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA; Cell Signaling and Immunity Section, NIAID, NIH, Bethesda, MD, USA
| | - John S Tsang
- Multiscale Systems Biology Section, Laboratory of Immune System Biology, NIAID, NIH, Bethesda, MD, USA; NIH Center for Human Immunology, NIAID, NIH, Bethesda, MD, USA.
| |
Collapse
|
71
|
Miles CE, McKinley SA, Ding F, Lehoucq RB. Inferring Stochastic Rates from Heterogeneous Snapshots of Particle Positions. Bull Math Biol 2024; 86:74. [PMID: 38740619 PMCID: PMC11578400 DOI: 10.1007/s11538-024-01301-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 04/20/2024] [Indexed: 05/16/2024]
Abstract
Many imaging techniques for biological systems-like fixation of cells coupled with fluorescence microscopy-provide sharp spatial resolution in reporting locations of individuals at a single moment in time but also destroy the dynamics they intend to capture. These snapshot observations contain no information about individual trajectories, but still encode information about movement and demographic dynamics, especially when combined with a well-motivated biophysical model. The relationship between spatially evolving populations and single-moment representations of their collective locations is well-established with partial differential equations (PDEs) and their inverse problems. However, experimental data is commonly a set of locations whose number is insufficient to approximate a continuous-in-space PDE solution. Here, motivated by popular subcellular imaging data of gene expression, we embrace the stochastic nature of the data and investigate the mathematical foundations of parametrically inferring demographic rates from snapshots of particles undergoing birth, diffusion, and death in a nuclear or cellular domain. Toward inference, we rigorously derive a connection between individual particle paths and their presentation as a Poisson spatial process. Using this framework, we investigate the properties of the resulting inverse problem and study factors that affect quality of inference. One pervasive feature of this experimental regime is the presence of cell-to-cell heterogeneity. Rather than being a hindrance, we show that cell-to-cell geometric heterogeneity can increase the quality of inference on dynamics for certain parameter regimes. Altogether, the results serve as a basis for more detailed investigations of subcellular spatial patterns of RNA molecules and other stochastically evolving populations that can only be observed for single instants in their time evolution.
Collapse
Affiliation(s)
| | - Scott A McKinley
- Department of Mathematics, Tulane University, New Orleans, LA, USA
| | - Fangyuan Ding
- Departments of Biomedical Engineering, Developmental and Cell Biology, University of California, Irvine, Irvine, USA
| | - Richard B Lehoucq
- Discrete Math and Optimization, Sandia National Laboratories, Albuquerque, NM, USA
| |
Collapse
|
72
|
Schmidt M, Avagyan S, Reiche K, Binder H, Loeffler-Wirth H. A Spatial Transcriptomics Browser for Discovering Gene Expression Landscapes across Microscopic Tissue Sections. Curr Issues Mol Biol 2024; 46:4701-4720. [PMID: 38785552 PMCID: PMC11119626 DOI: 10.3390/cimb46050284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 04/30/2024] [Accepted: 05/03/2024] [Indexed: 05/25/2024] Open
Abstract
A crucial feature of life is its spatial organization and compartmentalization on the molecular, cellular, and tissue levels. Spatial transcriptomics (ST) technology has opened a new chapter of the sequencing revolution, emerging rapidly with transformative effects across biology. This technique produces extensive and complex sequencing data, raising the need for computational methods for their comprehensive analysis and interpretation. We developed the ST browser web tool for the interactive discovery of ST images, focusing on different functional aspects such as single gene expression, the expression of functional gene sets, as well as the inspection of the spatial patterns of cell-cell interactions. As a unique feature, our tool applies self-organizing map (SOM) machine learning to the ST data. Our SOM data portrayal method generates individual gene expression landscapes for each spot in the ST image, enabling its downstream analysis with high resolution. The performance of the spatial browser is demonstrated by disentangling the intra-tumoral heterogeneity of melanoma and the microarchitecture of the mouse brain. The integration of machine-learning-based SOM portrayal into an interactive ST analysis environment opens novel perspectives for the comprehensive knowledge mining of the organization and interactions of cellular ecosystems.
Collapse
Affiliation(s)
- Maria Schmidt
- Interdisciplinary Centre for Bioinformatics (IZBI), Leipzig University, Härtelstr. 16-18, 04107 Leipzig, Germany; (M.S.); (H.B.)
| | - Susanna Avagyan
- Armenian Bioinformatics Institute, 3/6 Nelson Stepanyan Str., Yerevan 0062, Armenia
| | - Kristin Reiche
- Department of Diagnostics, Fraunhofer Institute for Cell Therapy and Immunology (IZI), Perlickstrasse 1, 04103 Leipzig, Germany
- Institute for Clinical Immunology, University Hospital of Leipzig, 04103 Leipzig, Germany
| | - Hans Binder
- Interdisciplinary Centre for Bioinformatics (IZBI), Leipzig University, Härtelstr. 16-18, 04107 Leipzig, Germany; (M.S.); (H.B.)
- Armenian Bioinformatics Institute, 3/6 Nelson Stepanyan Str., Yerevan 0062, Armenia
| | - Henry Loeffler-Wirth
- Interdisciplinary Centre for Bioinformatics (IZBI), Leipzig University, Härtelstr. 16-18, 04107 Leipzig, Germany; (M.S.); (H.B.)
| |
Collapse
|
73
|
Cai L, Anastassiou D. CASCC: a co-expression-assisted single-cell RNA-seq data clustering method. Bioinformatics 2024; 40:btae283. [PMID: 38662553 PMCID: PMC11091742 DOI: 10.1093/bioinformatics/btae283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 03/28/2024] [Accepted: 04/23/2024] [Indexed: 05/15/2024] Open
Abstract
SUMMARY Existing clustering methods for characterizing cell populations from single-cell RNA sequencing are constrained by several limitations stemming from the fact that clusters often cannot be homogeneous, particularly for transitioning populations. On the other hand, dominant cell populations within samples can be identified independently by their strong gene co-expression signatures using methods unrelated to partitioning. Here, we introduce a clustering method, CASCC (co-expression-assisted single-cell clustering), designed to improve biological accuracy using gene co-expression features identified using an unsupervised adaptive attractor algorithm. CASCC outperformed other methods as evidenced by multiple evaluation metrics, and our results suggest that CASCC can improve the analysis of single-cell transcriptomics, enabling potential new discoveries related to underlying biological mechanisms. AVAILABILITY AND IMPLEMENTATION The CASCC R package is publicly available at https://github.com/LingyiC/CASCC and https://zenodo.org/doi/10.5281/zenodo.10648327.
Collapse
Affiliation(s)
- Lingyi Cai
- Department of Systems Biology, Columbia University, New York, NY 10032, United States
- Department of Electrical Engineering, Columbia University, New York, NY 10027, United States
| | - Dimitris Anastassiou
- Department of Systems Biology, Columbia University, New York, NY 10032, United States
- Department of Electrical Engineering, Columbia University, New York, NY 10027, United States
- Irving Comprehensive Cancer Center, Columbia University, New York, NY 10032, United States
| |
Collapse
|
74
|
Park Y, Hauschild AC. The effect of data transformation on low-dimensional integration of single-cell RNA-seq. BMC Bioinformatics 2024; 25:171. [PMID: 38689234 PMCID: PMC11059821 DOI: 10.1186/s12859-024-05788-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 04/16/2024] [Indexed: 05/02/2024] Open
Abstract
BACKGROUND Recent developments in single-cell RNA sequencing have opened up a multitude of possibilities to study tissues at the level of cellular populations. However, the heterogeneity in single-cell sequencing data necessitates appropriate procedures to adjust for technological limitations and various sources of noise when integrating datasets from different studies. While many analysis procedures employ various preprocessing steps, they often overlook the importance of selecting and optimizing the employed data transformation methods. RESULTS This work investigates data transformation approaches used in single-cell clustering analysis tools and their effects on batch integration analysis. In particular, we compare 16 transformations and their impact on the low-dimensional representations, aiming to reduce the batch effect and integrate multiple single-cell sequencing data. Our results show that data transformations strongly influence the results of single-cell clustering on low-dimensional data space, such as those generated by UMAP or PCA. Moreover, these changes in low-dimensional space significantly affect trajectory analysis using multiple datasets, as well. However, the performance of the data transformations greatly varies across datasets, and the optimal method was different for each dataset. Additionally, we explored how data transformation impacts the analysis of deep feature encodings using deep neural network-based models, including autoencoder-based models and proto-typical networks. Data transformation also strongly affects the outcome of deep neural network models. CONCLUSIONS Our findings suggest that the batch effect and noise in integrative analysis are highly influenced by data transformation. Low-dimensional features can integrate different batches well when proper data transformation is applied. Furthermore, we found that the batch mixing score on low-dimensional space can guide the selection of the optimal data transformation. In conclusion, data preprocessing is one of the most crucial analysis steps and needs to be cautiously considered in the integrative analysis of multiple scRNA-seq datasets.
Collapse
Affiliation(s)
- Youngjun Park
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
- International Max Planck Research Schools for Genome Science, Georg-August-Universität Göttingen, Göttingen, Germany
| | - Anne-Christin Hauschild
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany.
- Campus-Institute Data Science (CIDAS), Georg-August-Universität Göttingen, Göttingen, Germany.
| |
Collapse
|
75
|
Wang Y, Chen X, Tang N, Guo M, Ai D. Boosting Clear Cell Renal Carcinoma-Specific Drug Discovery Using a Deep Learning Algorithm and Single-Cell Analysis. Int J Mol Sci 2024; 25:4134. [PMID: 38612943 PMCID: PMC11012314 DOI: 10.3390/ijms25074134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Revised: 03/26/2024] [Accepted: 04/03/2024] [Indexed: 04/14/2024] Open
Abstract
Clear cell renal carcinoma (ccRCC), the most common subtype of renal cell carcinoma, has the high heterogeneity of a highly complex tumor microenvironment. Existing clinical intervention strategies, such as target therapy and immunotherapy, have failed to achieve good therapeutic effects. In this article, single-cell transcriptome sequencing (scRNA-seq) data from six patients downloaded from the GEO database were adopted to describe the tumor microenvironment (TME) of ccRCC, including its T cells, tumor-associated macrophages (TAMs), endothelial cells (ECs), and cancer-associated fibroblasts (CAFs). Based on the differential typing of the TME, we identified tumor cell-specific regulatory programs that are mediated by three key transcription factors (TFs), whilst the TF EPAS1/HIF-2α was identified via drug virtual screening through our analysis of ccRCC's protein structure. Then, a combined deep graph neural network and machine learning algorithm were used to select anti-ccRCC compounds from bioactive compound libraries, including the FDA-approved drug library, natural product library, and human endogenous metabolite compound library. Finally, five compounds were obtained, including two FDA-approved drugs (flufenamic acid and fludarabine), one endogenous metabolite, one immunology/inflammation-related compound, and one inhibitor of DNA methyltransferase (N4-methylcytidine, a cytosine nucleoside analogue that, like zebularine, has the mechanism of inhibiting DNA methyltransferase). Based on the tumor microenvironment characteristics of ccRCC, five ccRCC-specific compounds were identified, which would give direction of the clinical treatment for ccRCC patients.
Collapse
Affiliation(s)
| | | | | | | | - Dongmei Ai
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China; (Y.W.); (X.C.); (N.T.); (M.G.)
| |
Collapse
|
76
|
Snyder KT, Creanza N. Birds convey complex signals in simple songs. Nature 2024; 628:37-39. [PMID: 38509289 DOI: 10.1038/d41586-024-00677-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/22/2024]
|
77
|
Grones C, Eekhout T, Shi D, Neumann M, Berg LS, Ke Y, Shahan R, Cox KL, Gomez-Cano F, Nelissen H, Lohmann JU, Giacomello S, Martin OC, Cole B, Wang JW, Kaufmann K, Raissig MT, Palfalvi G, Greb T, Libault M, De Rybel B. Best practices for the execution, analysis, and data storage of plant single-cell/nucleus transcriptomics. THE PLANT CELL 2024; 36:812-828. [PMID: 38231860 PMCID: PMC10980355 DOI: 10.1093/plcell/koae003] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 10/17/2023] [Accepted: 10/24/2023] [Indexed: 01/19/2024]
Abstract
Single-cell and single-nucleus RNA-sequencing technologies capture the expression of plant genes at an unprecedented resolution. Therefore, these technologies are gaining traction in plant molecular and developmental biology for elucidating the transcriptional changes across cell types in a specific tissue or organ, upon treatments, in response to biotic and abiotic stresses, or between genotypes. Despite the rapidly accelerating use of these technologies, collective and standardized experimental and analytical procedures to support the acquisition of high-quality data sets are still missing. In this commentary, we discuss common challenges associated with the use of single-cell transcriptomics in plants and propose general guidelines to improve reproducibility, quality, comparability, and interpretation and to make the data readily available to the community in this fast-developing field of research.
Collapse
Affiliation(s)
- Carolin Grones
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB Centre for Plant Systems Biology, Ghent 9052, Belgium
| | - Thomas Eekhout
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB Centre for Plant Systems Biology, Ghent 9052, Belgium
- VIB Single Cell Core Facility, Ghent 9052, Belgium
| | - Dongbo Shi
- Centre for Organismal Studies, Heidelberg University, 69120 Heidelberg, Germany
- Institute of Biochemistry and Biology, University of Potsdam, 14476 Potsdam, Germany
| | - Manuel Neumann
- Institute of Biology, Humboldt-Universität zu Berlin, 10115 Berlin, Germany
| | - Lea S Berg
- Institute of Plant Sciences, University of Bern, 3012 Bern, Switzerland
| | - Yuji Ke
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB Centre for Plant Systems Biology, Ghent 9052, Belgium
| | - Rachel Shahan
- Department of Biology, Duke University, Durham, NC 27708, USA
- Howard Hughes Medical Institute, Duke University, Durham, NC 27708, USA
| | - Kevin L Cox
- Donald Danforth Plant Science Center, St. Louis, MO 63132, USA
| | - Fabio Gomez-Cano
- Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Hilde Nelissen
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB Centre for Plant Systems Biology, Ghent 9052, Belgium
| | - Jan U Lohmann
- Centre for Organismal Studies, Heidelberg University, 69120 Heidelberg, Germany
| | - Stefania Giacomello
- SciLifeLab, Department of Gene Technology, KTH Royal Institute of Technology, 17165 Solna, Sweden
| | - Olivier C Martin
- Universities of Paris-Saclay, Paris-Cité and Evry, CNRS, INRAE, Institute of Plant Sciences Paris-Saclay, Gif-sur-Yvette 91192, France
| | - Benjamin Cole
- DOE-Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Jia-Wei Wang
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences (CEMPS), Institute of Plant Physiology and Ecology (SIPPE), Chinese Academy of Sciences (CAS), Shanghai 200032, China
| | - Kerstin Kaufmann
- Institute of Biology, Humboldt-Universität zu Berlin, 10115 Berlin, Germany
| | - Michael T Raissig
- Institute of Plant Sciences, University of Bern, 3012 Bern, Switzerland
| | - Gergo Palfalvi
- Department of Comparative Development and Genetics, Max Planck Institute for Plant Breeding Research, 50829 Cologne, Germany
| | - Thomas Greb
- Centre for Organismal Studies, Heidelberg University, 69120 Heidelberg, Germany
| | - Marc Libault
- Division of Plant Science and Technology, Interdisciplinary Plant Group, College of Agriculture, Food, and Natural Resources, University of Missouri-Columbia, Columbia, MO 65201, USA
| | - Bert De Rybel
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB Centre for Plant Systems Biology, Ghent 9052, Belgium
| |
Collapse
|
78
|
Dong X, Leary JR, Yang C, Brusko MA, Brusko TM, Bacher R. Data-driven selection of analysis decisions in single-cell RNA-seq trajectory inference. Brief Bioinform 2024; 25:bbae216. [PMID: 38725155 PMCID: PMC11082074 DOI: 10.1093/bib/bbae216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 03/01/2024] [Accepted: 04/25/2024] [Indexed: 05/13/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) experiments have become instrumental in developmental and differentiation studies, enabling the profiling of cells at a single or multiple time-points to uncover subtle variations in expression profiles reflecting underlying biological processes. Benchmarking studies have compared many of the computational methods used to reconstruct cellular dynamics; however, researchers still encounter challenges in their analysis due to uncertainty with respect to selecting the most appropriate methods and parameters. Even among universal data processing steps used by trajectory inference methods such as feature selection and dimension reduction, trajectory methods' performances are highly dataset-specific. To address these challenges, we developed Escort, a novel framework for evaluating a dataset's suitability for trajectory inference and quantifying trajectory properties influenced by analysis decisions. Escort evaluates the suitability of trajectory analysis and the combined effects of processing choices using trajectory-specific metrics. Escort navigates single-cell trajectory analysis through these data-driven assessments, reducing uncertainty and much of the decision burden inherent to trajectory inference analyses. Escort is implemented in an accessible R package and R/Shiny application, providing researchers with the necessary tools to make informed decisions during trajectory analysis and enabling new insights into dynamic biological processes at single-cell resolution.
Collapse
Affiliation(s)
- Xiaoru Dong
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, United States
| | - Jack R Leary
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, United States
| | - Chuanhao Yang
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, United States
| | - Maigan A Brusko
- Diabetes Institute, University of Florida, Gainesville, FL 32610, United States
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, FL 32610, United States
| | - Todd M Brusko
- Diabetes Institute, University of Florida, Gainesville, FL 32610, United States
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, FL 32610, United States
- Department of Pediatrics, College of Medicine, University of Florida, Gainesville, FL 32610, United States
| | - Rhonda Bacher
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, United States
- Diabetes Institute, University of Florida, Gainesville, FL 32610, United States
| |
Collapse
|
79
|
Chen K, Zhou Y, Ding M, Wang Y, Ren Z, Yang Y. Self-supervised learning on millions of primary RNA sequences from 72 vertebrates improves sequence-based RNA splicing prediction. Brief Bioinform 2024; 25:bbae163. [PMID: 38605640 PMCID: PMC11009468 DOI: 10.1093/bib/bbae163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 02/22/2024] [Accepted: 03/19/2024] [Indexed: 04/13/2024] Open
Abstract
Language models pretrained by self-supervised learning (SSL) have been widely utilized to study protein sequences, while few models were developed for genomic sequences and were limited to single species. Due to the lack of genomes from different species, these models cannot effectively leverage evolutionary information. In this study, we have developed SpliceBERT, a language model pretrained on primary ribonucleic acids (RNA) sequences from 72 vertebrates by masked language modeling, and applied it to sequence-based modeling of RNA splicing. Pretraining SpliceBERT on diverse species enables effective identification of evolutionarily conserved elements. Meanwhile, the learned hidden states and attention weights can characterize the biological properties of splice sites. As a result, SpliceBERT was shown effective on several downstream tasks: zero-shot prediction of variant effects on splicing, prediction of branchpoints in humans, and cross-species prediction of splice sites. Our study highlighted the importance of pretraining genomic language models on a diverse range of species and suggested that SSL is a promising approach to enhance our understanding of the regulatory logic underlying genomic sequences.
Collapse
Affiliation(s)
- Ken Chen
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yue Zhou
- Peng Cheng Laboratory, Shenzhen, China
| | - Maolin Ding
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yu Wang
- Peng Cheng Laboratory, Shenzhen, China
| | | | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
- Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University), Ministry of Education, China
| |
Collapse
|
80
|
Jing Z, Zhu Q, Li L, Xie Y, Wu X, Fang Q, Yang B, Dai B, Xu X, Pan H, Bai Y. Spaco: A comprehensive tool for coloring spatial data at single-cell resolution. PATTERNS (NEW YORK, N.Y.) 2024; 5:100915. [PMID: 38487801 PMCID: PMC10935509 DOI: 10.1016/j.patter.2023.100915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 12/11/2023] [Accepted: 12/18/2023] [Indexed: 03/17/2024]
Abstract
Understanding tissue architecture and niche-specific microenvironments in spatially resolved transcriptomics (SRT) requires in situ annotation and labeling of cells. Effective spatial visualization of these data demands appropriate colorization of numerous cell types. However, current colorization frameworks often inadequately account for the spatial relationships between cell types. This results in perceptual ambiguity in neighboring cells of biological distinct types, particularly in complex environments such as brain or tumor. To address this, we introduce Spaco, a potent tool for spatially aware colorization. Spaco utilizes the Degree of Interlacement metric to construct a weighted graph that evaluates the spatial relationships among different cell types, refining color assignments. Furthermore, Spaco incorporates an adaptive palette selection approach to amplify chromatic distinctions. When benchmarked on four diverse datasets, Spaco outperforms existing solutions, capturing complex spatial relationships and boosting visual clarity. Spaco ensures broad accessibility by accommodating color vision deficiency and offering open-accessible code in both Python and R.
Collapse
Affiliation(s)
- Zehua Jing
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- BGI Research, Hangzhou 310012, China
| | | | - Linxuan Li
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- BGI Research, Shenzhen 518083, China
| | - Yue Xie
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- BGI Research, Shenzhen 518083, China
| | - Xinchao Wu
- BGI Research, Hangzhou 310012, China
- School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Qi Fang
- BGI Research, Shenzhen 518083, China
| | - Bolin Yang
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- BGI Research, Hangzhou 310012, China
| | - Baojun Dai
- BGI Research, Hangzhou 310012, China
- School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Xun Xu
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- Guangdong Provincial Key Laboratory of Genome Read and Write, BGI Research, Shenzhen 518083, China
- BGI Research, Shenzhen 518083, China
| | - Hailin Pan
- BGI Research, Hangzhou 310012, China
- BGI Research, Shenzhen 518083, China
| | - Yinqi Bai
- BGI Research, Hangzhou 310012, China
- BGI Research, Shenzhen 518083, China
| |
Collapse
|
81
|
Ma R, Sun ED, Donoho D, Zou J. Principled and interpretable alignability testing and integration of single-cell data. Proc Natl Acad Sci U S A 2024; 121:e2313719121. [PMID: 38416677 DOI: 10.1073/pnas.2313719121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 01/23/2024] [Indexed: 03/01/2024] Open
Abstract
Single-cell data integration can provide a comprehensive molecular view of cells, and many algorithms have been developed to remove unwanted technical or biological variations and integrate heterogeneous single-cell datasets. Despite their wide usage, existing methods suffer from several fundamental limitations. In particular, we lack a rigorous statistical test for whether two high-dimensional single-cell datasets are alignable (and therefore should even be aligned). Moreover, popular methods can substantially distort the data during alignment, making the aligned data and downstream analysis difficult to interpret. To overcome these limitations, we present a spectral manifold alignment and inference (SMAI) framework, which enables principled and interpretable alignability testing and structure-preserving integration of single-cell data with the same type of features. SMAI provides a statistical test to robustly assess the alignability between datasets to avoid misleading inference and is justified by high-dimensional statistical theory. On a diverse range of real and simulated benchmark datasets, it outperforms commonly used alignment methods. Moreover, we show that SMAI improves various downstream analyses such as identification of differentially expressed genes and imputation of single-cell spatial transcriptomics, providing further biological insights. SMAI's interpretability also enables quantification and a deeper understanding of the sources of technical confounders in single-cell data.
Collapse
Affiliation(s)
- Rong Ma
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115
| | - Eric D Sun
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305
| | - David Donoho
- Department of Statistics, Stanford University, Stanford, CA 94305
| | - James Zou
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305
| |
Collapse
|
82
|
Woodruff MC, Faliti CE, Sanz I. Systems biology of B cells in COVID-19. Semin Immunol 2024; 72:101875. [PMID: 38489999 PMCID: PMC11988200 DOI: 10.1016/j.smim.2024.101875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 03/04/2024] [Accepted: 03/04/2024] [Indexed: 03/17/2024]
Abstract
The integration of multi-'omic datasets into complex systems-wide assessments has become a mainstay in immunologic investigation. This focus on high-dimensional data collection and analysis was on full display in the investigation of COVID-19, the respiratory illness resulting from infection by the novel coronavirus SARS-CoV-2. Particularly in the area of B cell biology, tremendous efforts in both cellular and serologic investigation have resulted in an increasingly detailed mapping of the coordinated effector, memory, and antibody secreting cell responses that underpin the development of humoral immunity in response to primary viral infection. Further, the rapid development and deployment of effective vaccines has allowed for the assessment of developing memory responses across a wide variety of immune contexts, including in patients with compromised immune function. The result has been a period of rapid gains in the understanding of B cell biology unrestricted to the study of COVID-19. Here, we outline the systems-level technologies that have been routinely implemented in these investigations throughout the pandemic, and discuss how their use has led to clear and applicable gains in pursuance of the amelioration of human infectious disease and beyond.
Collapse
Affiliation(s)
- Matthew C Woodruff
- Department of Medicine, Division of Rheumatology, Lowance Center for Human Immunology, Emory University, Atlanta, GA, USA; Emory Autoimmunity Center of Excellence, Emory University, Atlanta, GA, USA.
| | - Caterina E Faliti
- Department of Medicine, Division of Rheumatology, Lowance Center for Human Immunology, Emory University, Atlanta, GA, USA; Emory Autoimmunity Center of Excellence, Emory University, Atlanta, GA, USA.
| | - Ignacio Sanz
- Department of Medicine, Division of Rheumatology, Lowance Center for Human Immunology, Emory University, Atlanta, GA, USA; Emory Autoimmunity Center of Excellence, Emory University, Atlanta, GA, USA
| |
Collapse
|
83
|
Xia L, Lee C, Li JJ. Statistical method scDEED for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters. Nat Commun 2024; 15:1753. [PMID: 38409103 PMCID: PMC10897166 DOI: 10.1038/s41467-024-45891-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 02/06/2024] [Indexed: 02/28/2024] Open
Abstract
Two-dimensional (2D) embedding methods are crucial for single-cell data visualization. Popular methods such as t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) are commonly used for visualizing cell clusters; however, it is well known that t-SNE and UMAP's 2D embeddings might not reliably inform the similarities among cell clusters. Motivated by this challenge, we present a statistical method, scDEED, for detecting dubious cell embeddings output by a 2D-embedding method. By calculating a reliability score for every cell embedding based on the similarity between the cell's 2D-embedding neighbors and pre-embedding neighbors, scDEED identifies the cell embeddings with low reliability scores as dubious and those with high reliability scores as trustworthy. Moreover, by minimizing the number of dubious cell embeddings, scDEED provides intuitive guidance for optimizing the hyperparameters of an embedding method. We show the effectiveness of scDEED on multiple datasets for detecting dubious cell embeddings and optimizing the hyperparameters of t-SNE and UMAP.
Collapse
Affiliation(s)
- Lucy Xia
- Department of ISOM, School of Business and Management, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | - Christy Lee
- Department of Statistics and Data Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Jingyi Jessica Li
- Department of Statistics and Data Science, University of California, Los Angeles, Los Angeles, CA, USA.
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA, USA.
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA.
- Radcliffe Institute of Advanced Study, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
84
|
Walsh JR, Sun G, Balan J, Hardcastle J, Vollenweider J, Jerde C, Rumilla K, Koellner C, Koleilat A, Hasadsri L, Kipp B, Jenkinson G, Klee E. A supervised learning method for classifying methylation disorders. BMC Bioinformatics 2024; 25:66. [PMID: 38347515 PMCID: PMC10863277 DOI: 10.1186/s12859-024-05673-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 01/24/2024] [Indexed: 02/15/2024] Open
Abstract
BACKGROUND DNA methylation is one of the most stable and well-characterized epigenetic alterations in humans. Accordingly, it has already found clinical utility as a molecular biomarker in a variety of disease contexts. Existing methods for clinical diagnosis of methylation-related disorders focus on outlier detection in a small number of CpG sites using standardized cutoffs which differentiate healthy from abnormal methylation levels. The standardized cutoff values used in these methods do not take into account methylation patterns which are known to differ between the sexes and with age. RESULTS Here we profile genome-wide DNA methylation from blood samples drawn from within a cohort composed of healthy controls of different age and sex alongside patients with Prader-Willi syndrome (PWS), Beckwith-Wiedemann syndrome, Fragile-X syndrome, Angelman syndrome, and Silver-Russell syndrome. We propose a Generalized Additive Model to perform age and sex adjusted outlier analysis of around 700,000 CpG sites throughout the human genome. Utilizing z-scores among the cohort for each site, we deployed an ensemble based machine learning pipeline and achieved a combined prediction accuracy of 0.96 (Binomial 95% Confidence Interval 0.868[Formula: see text]0.995). CONCLUSION We demonstrate a method for age and sex adjusted outlier detection of differentially methylated loci based on a large cohort of healthy individuals. We present a custom machine learning pipeline utilizing this outlier analysis to classify samples for potential methylation associated congenital disorders. These methods are able to achieve high accuracy when used with machine learning methods to classify abnormal methylation patterns.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Alaa Koleilat
- Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR, USA
| | | | | | | | | |
Collapse
|
85
|
Qiu C, Martin BK, Welsh IC, Daza RM, Le TM, Huang X, Nichols EK, Taylor ML, Fulton O, O'Day DR, Gomes AR, Ilcisin S, Srivatsan S, Deng X, Disteche CM, Noble WS, Hamazaki N, Moens CB, Kimelman D, Cao J, Schier AF, Spielmann M, Murray SA, Trapnell C, Shendure J. A single-cell time-lapse of mouse prenatal development from gastrula to birth. Nature 2024; 626:1084-1093. [PMID: 38355799 PMCID: PMC10901739 DOI: 10.1038/s41586-024-07069-w] [Citation(s) in RCA: 57] [Impact Index Per Article: 57.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 01/15/2024] [Indexed: 02/16/2024]
Abstract
The house mouse (Mus musculus) is an exceptional model system, combining genetic tractability with close evolutionary affinity to humans1,2. Mouse gestation lasts only 3 weeks, during which the genome orchestrates the astonishing transformation of a single-cell zygote into a free-living pup composed of more than 500 million cells. Here, to establish a global framework for exploring mammalian development, we applied optimized single-cell combinatorial indexing3 to profile the transcriptional states of 12.4 million nuclei from 83 embryos, precisely staged at 2- to 6-hour intervals spanning late gastrulation (embryonic day 8) to birth (postnatal day 0). From these data, we annotate hundreds of cell types and explore the ontogenesis of the posterior embryo during somitogenesis and of kidney, mesenchyme, retina and early neurons. We leverage the temporal resolution and sampling depth of these whole-embryo snapshots, together with published data4-8 from earlier timepoints, to construct a rooted tree of cell-type relationships that spans the entirety of prenatal development, from zygote to birth. Throughout this tree, we systematically nominate genes encoding transcription factors and other proteins as candidate drivers of the in vivo differentiation of hundreds of cell types. Remarkably, the most marked temporal shifts in cell states are observed within one hour of birth and presumably underlie the massive physiological adaptations that must accompany the successful transition of a mammalian fetus to life outside the womb.
Collapse
Affiliation(s)
- Chengxiang Qiu
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
| | - Beth K Martin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | | | - Riza M Daza
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Truc-Mai Le
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Xingfan Huang
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Eva K Nichols
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Megan L Taylor
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Olivia Fulton
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Diana R O'Day
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | | | - Saskia Ilcisin
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Sanjay Srivatsan
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Medical Scientist Training Program, University of Washington, Seattle, WA, USA
| | - Xinxian Deng
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - Christine M Disteche
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
- Department of Medicine, University of Washington, Seattle, WA, USA
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Nobuhiko Hamazaki
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, Seattle, WA, USA
| | - Cecilia B Moens
- Division of Basic Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - David Kimelman
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Junyue Cao
- Laboratory of Single-Cell Genomics and Population dynamics, The Rockefeller University, New York, NY, USA
| | - Alexander F Schier
- Biozentrum, University of Basel, Basel, Switzerland
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
| | - Malte Spielmann
- Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institute of Human Genetics, University Hospitals Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Kiel, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Hamburg, Lübeck, Kiel, Lübeck, Germany
| | | | - Cole Trapnell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
- Seattle Hub for Synthetic Biology, Seattle, WA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, Seattle, WA, USA.
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA.
- Seattle Hub for Synthetic Biology, Seattle, WA, USA.
| |
Collapse
|
86
|
Zheng L, Shi S, Lu M, Fang P, Pan Z, Zhang H, Zhou Z, Zhang H, Mou M, Huang S, Tao L, Xia W, Li H, Zeng Z, Zhang S, Chen Y, Li Z, Zhu F. AnnoPRO: a strategy for protein function annotation based on multi-scale protein representation and a hybrid deep learning of dual-path encoding. Genome Biol 2024; 25:41. [PMID: 38303023 PMCID: PMC10832132 DOI: 10.1186/s13059-024-03166-1] [Citation(s) in RCA: 33] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 01/05/2024] [Indexed: 02/03/2024] Open
Abstract
Protein function annotation has been one of the longstanding issues in biological sciences, and various computational methods have been developed. However, the existing methods suffer from a serious long-tail problem, with a large number of GO families containing few annotated proteins. Herein, an innovative strategy named AnnoPRO was therefore constructed by enabling sequence-based multi-scale protein representation, dual-path protein encoding using pre-training, and function annotation by long short-term memory-based decoding. A variety of case studies based on different benchmarks were conducted, which confirmed the superior performance of AnnoPRO among available methods. Source code and models have been made freely available at: https://github.com/idrblab/AnnoPRO and https://zenodo.org/records/10012272.
Collapse
Affiliation(s)
- Lingyan Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China
| | - Shuiyang Shi
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Mingkun Lu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Pan Fang
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Hongning Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Zhimeng Zhou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Hanyu Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Shijie Huang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Weiqi Xia
- Pharmaceutical Department, Zhejiang Provincial People's Hospital, Hangzhou, 310014, China
| | - Honglin Li
- School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Zhenyu Zeng
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
| | - Shun Zhang
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
| | - Yuzong Chen
- State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, The Graduate School at Shenzhen, Tsinghua University, Shenzhen, 518055, China
| | - Zhaorong Li
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China.
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China.
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China.
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China.
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China.
| |
Collapse
|
87
|
Tyler SR, Lozano-Ojalvo D, Guccione E, Schadt EE. Anti-correlated feature selection prevents false discovery of subpopulations in scRNAseq. Nat Commun 2024; 15:699. [PMID: 38267438 PMCID: PMC10808220 DOI: 10.1038/s41467-023-43406-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 11/07/2023] [Indexed: 01/26/2024] Open
Abstract
While sub-clustering cell-populations has become popular in single cell-omics, negative controls for this process are lacking. Popular feature-selection/clustering algorithms fail the null-dataset problem, allowing erroneous subdivisions of homogenous clusters until nearly each cell is called its own cluster. Using real and synthetic datasets, we find that anti-correlated gene selection reduces or eliminates erroneous subdivisions, increases marker-gene selection efficacy, and efficiently scales to millions of cells.
Collapse
Affiliation(s)
- Scott R Tyler
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Oncological Sciences, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Daniel Lozano-Ojalvo
- Department of Dermatology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ernesto Guccione
- Department of Oncological Sciences, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Therapeutics Discovery, Department of Oncological Sciences and Pharmacological Sciences, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Bioinformatics for Next Generation Sequencing (BiNGS) Shared Resource Facility, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Eric E Schadt
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
88
|
Lederer AR, Leonardi M, Talamanca L, Herrera A, Droin C, Khven I, Carvalho HJF, Valente A, Mantes AD, Arabí PM, Pinello L, Naef F, Manno GL. Statistical inference with a manifold-constrained RNA velocity model uncovers cell cycle speed modulations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.18.576093. [PMID: 38328127 PMCID: PMC10849531 DOI: 10.1101/2024.01.18.576093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Across a range of biological processes, cells undergo coordinated changes in gene expression, resulting in transcriptome dynamics that unfold within a low-dimensional manifold. Single-cell RNA-sequencing (scRNA-seq) only measures temporal snapshots of gene expression. However, information on the underlying low-dimensional dynamics can be extracted using RNA velocity, which models unspliced and spliced RNA abundances to estimate the rate of change of gene expression. Available RNA velocity algorithms can be fragile and rely on heuristics that lack statistical control. Moreover, the estimated vector field is not dynamically consistent with the traversed gene expression manifold. Here, we develop a generative model of RNA velocity and a Bayesian inference approach that solves these problems. Our model couples velocity field and manifold estimation in a reformulated, unified framework, so as to coherently identify the parameters of an autonomous dynamical system. Focusing on the cell cycle, we implemented VeloCycle to study gene regulation dynamics on one-dimensional periodic manifolds and validated using live-imaging its ability to infer actual cell cycle periods. We benchmarked RNA velocity inference with sensitivity analyses and demonstrated one- and multiple-sample testing. We also conducted Markov chain Monte Carlo inference on the model, uncovering key relationships between gene-specific kinetics and our gene-independent velocity estimate. Finally, we applied VeloCycle to in vivo samples and in vitro genome-wide Perturb-seq, revealing regionally-defined proliferation modes in neural progenitors and the effect of gene knockdowns on cell cycle speed. Ultimately, VeloCycle expands the scRNA-seq analysis toolkit with a modular and statistically rigorous RNA velocity inference framework.
Collapse
|
89
|
Yao Q, Jia Y, Ma P. cellMarkerPipe: Cell Marker Identification and Evaluation Pipeline in Single Cell Transcriptomes. RESEARCH SQUARE 2024:rs.3.rs-3844718. [PMID: 38313296 PMCID: PMC10836098 DOI: 10.21203/rs.3.rs-3844718/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2024]
Abstract
Assessing marker genes from all cell clusters can be time-consuming and lack systematic strategy. Streamlining this process through a unified computational platform that automates identification and benchmarking will greatly enhance efficiency and ensure a fair evaluation. We therefore developed a novel computational platform, cellMarkerPipe (https://github.com/yao-laboratory/cellMarkerPipe), for automated cell-type specific marker gene identification from scRNA-seq data, coupled with comprehensive evaluation schema. CellMarkerPipe adaptively wraps around a collection of commonly used and state-of-the-art tools, including Seurat, COSG, SC3, SCMarker, COMET, and scGeneFit. From rigorously testing across diverse samples, we ascertain SCMarker's overall reliable performance in single marker gene selection, with COSG showing commendable speed and comparable efficacy. Furthermore, we demonstrate the pivotal role of our approach in real-world medical datasets. This general and opensource pipeline stands as a significant advancement in streamlining cell marker gene identification and evaluation, fitting broad applications in the field of cellular biology and medical research.
Collapse
|
90
|
Chrysinas P, Venkatesan S, Ang I, Ghosh V, Chen C, Neelamegham S, Gunawan R. Cell and tissue-specific glycosylation pathways informed by single-cell transcriptomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.26.559616. [PMID: 38260527 PMCID: PMC10802235 DOI: 10.1101/2023.09.26.559616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
While single cell studies have made significant impacts in various subfields of biology, they lag in the Glycosciences. To address this gap, we analyzed single-cell glycogene expressions in the Tabula Sapiens dataset of human tissues and cell types using a recent glycosylation-specific gene ontology (GlycoEnzOnto). At the median sequencing (count) depth, ~40-50 out of 400 glycogenes were detected in individual cells. Upon increasing the sequencing depth, the number of detectable glycogenes saturates at ~200 glycogenes, suggesting that the average human cell expresses about half of the glycogene repertoire. Hierarchies in glycogene and glycopathway expressions emerged from our analysis: nucleotide-sugar synthesis and transport exhibited the highest gene expressions, followed by genes for core enzymes, glycan modification and extensions, and finally terminal modifications. Interestingly, the same cell types showed variable glycopathway expressions based on their organ or tissue origin, suggesting nuanced cell- and tissue-specific glycosylation patterns. Probing deeper into the transcription factors (TFs) of glycogenes, we identified distinct groupings of TFs controlling different aspects of glycosylation: core biosynthesis, terminal modifications, etc. We present webtools to explore the interconnections across glycogenes, glycopathways, and TFs regulating glycosylation in human cell/tissue types. Overall, the study presents an overview of glycosylation across multiple human organ systems.
Collapse
Affiliation(s)
- Panagiotis Chrysinas
- Department of Chemical and Biological Engineering, University at Buffalo-SUNY, Buffalo, NY, 14260, USA
| | - Shriramprasad Venkatesan
- Department of Chemical and Biological Engineering, University at Buffalo-SUNY, Buffalo, NY, 14260, USA
| | - Isaac Ang
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA
| | - Vishnu Ghosh
- Department of Chemical and Biological Engineering, University at Buffalo-SUNY, Buffalo, NY, 14260, USA
| | - Changyou Chen
- Department of Computer Science and Engineering, University at Buffalo-SUNY, Buffalo, NY, 14260, USA
| | - Sriram Neelamegham
- Department of Chemical and Biological Engineering, University at Buffalo-SUNY, Buffalo, NY, 14260, USA
| | - Rudiyanto Gunawan
- Department of Chemical and Biological Engineering, University at Buffalo-SUNY, Buffalo, NY, 14260, USA
| |
Collapse
|
91
|
Bump P, Lubeck L. Marine Invertebrates One Cell at A Time: Insights from Single-Cell Analysis. Integr Comp Biol 2023; 63:999-1009. [PMID: 37188638 PMCID: PMC10714908 DOI: 10.1093/icb/icad034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 04/25/2023] [Accepted: 05/05/2023] [Indexed: 05/17/2023] Open
Abstract
Over the past decade, single-cell RNA-sequencing (scRNA-seq) has made it possible to study the cellular diversity of a broad range of organisms. Technological advances in single-cell isolation and sequencing have expanded rapidly, allowing the transcriptomic profile of individual cells to be captured. As a result, there has been an explosion of cell type atlases created for many different marine invertebrate species from across the tree of life. Our focus in this review is to synthesize current literature on marine invertebrate scRNA-seq. Specifically, we provide perspectives on key insights from scRNA-seq studies, including descriptive studies of cell type composition, how cells respond in dynamic processes such as development and regeneration, and the evolution of new cell types. Despite these tremendous advances, there also lie several challenges ahead. We discuss the important considerations that are essential when making comparisons between experiments, or between datasets from different species. Finally, we address the future of single-cell analyses in marine invertebrates, including combining scRNA-seq data with other 'omics methods to get a fuller understanding of cellular complexities. The full diversity of cell types across marine invertebrates remains unknown and understanding this diversity and evolution will provide rich areas for future study.
Collapse
Affiliation(s)
- Paul Bump
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA
| | - Lauren Lubeck
- Department of Biology, Hopkins Marine Station, Stanford University, Pacific Grove, CA 93950, USA
| |
Collapse
|
92
|
Wang JH, Tsin D, Engel TA. Predictive variational autoencoder for learning robust representations of time-series data. ARXIV 2023:arXiv:2312.06932v1. [PMID: 38168462 PMCID: PMC10760197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Variational autoencoders (VAEs) have been used extensively to discover low-dimensional latent factors governing neural activity and animal behavior. However, without careful model selection, the uncovered latent factors may reflect noise in the data rather than true underlying features, rendering such representations unsuitable for scientific interpretation. Existing solutions to this problem involve introducing additional measured variables or data augmentations specific to a particular data type. We propose a VAE architecture that predicts the next point in time and show that it mitigates the learning of spurious features. In addition, we introduce a model selection metric based on smoothness over time in the latent space. We show that together these two constraints on VAEs to be smooth over time produce robust latent representations and faithfully recover latent factors on synthetic datasets.
Collapse
Affiliation(s)
- Julia H Wang
- Cold Spring Harbor Laboratory School of Biological Sciences Cold Spring Harbor Laboratory Cold Spring Harbor, New York, USA
| | - Dexter Tsin
- Princeton Neuroscience Institute Prineton University Princeton, New Jersey, USA
| | - Tatiana A Engel
- Princeton Neuroscience Institute Prineton University Princeton, New Jersey, USA
| |
Collapse
|
93
|
Ghaddar B, De S. Hierarchical and automated cell-type annotation and inference of cancer cell of origin with Census. Bioinformatics 2023; 39:btad714. [PMID: 38011649 PMCID: PMC10713118 DOI: 10.1093/bioinformatics/btad714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 10/26/2023] [Accepted: 11/25/2023] [Indexed: 11/29/2023] Open
Abstract
MOTIVATION Cell-type annotation is a time-consuming yet critical first step in the analysis of single-cell RNA-seq data, especially when multiple similar cell subtypes with overlapping marker genes are present. Existing automated annotation methods have a number of limitations, including requiring large reference datasets, high computation time, shallow annotation resolution, and difficulty in identifying cancer cells or their most likely cell of origin. RESULTS We developed Census, a biologically intuitive and fully automated cell-type identification method for single-cell RNA-seq data that can deeply annotate normal cells in mammalian tissues and identify malignant cells and their likely cell of origin. Motivated by the inherently stratified developmental programs of cellular differentiation, Census infers hierarchical cell-type relationships and uses gradient-boosted \decision trees that capitalize on nodal cell-type relationships to achieve high prediction speed and accuracy. When benchmarked on 44 atlas-scale normal and cancer, human and mouse tissues, Census significantly outperforms state-of-the-art methods across multiple metrics and naturally predicts the cell-of-origin of different cancers. Census is pretrained on the Tabula Sapiens to classify 175 cell-types from 24 organs; however, users can seamlessly train their own models for customized applications. AVAILABILITY AND IMPLEMENTATION Census is available at Zenodo https://zenodo.org/records/7017103 and on our Github https://github.com/sjdlabgroup/Census.
Collapse
Affiliation(s)
- Bassel Ghaddar
- Center for Systems and Computational Biology, Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ 08901, United States
| | - Subhajyoti De
- Center for Systems and Computational Biology, Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ 08901, United States
| |
Collapse
|
94
|
Shinn M. Phantom oscillations in principal component analysis. Proc Natl Acad Sci U S A 2023; 120:e2311420120. [PMID: 37988465 PMCID: PMC10691246 DOI: 10.1073/pnas.2311420120] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 10/18/2023] [Indexed: 11/23/2023] Open
Abstract
Principal component analysis (PCA) is a dimensionality reduction method that is known for being simple and easy to interpret. Principal components are often interpreted as low-dimensional patterns in high-dimensional space. However, this simple interpretation fails for timeseries, spatial maps, and other continuous data. In these cases, nonoscillatory data may have oscillatory principal components. Here, we show that two common properties of data cause oscillatory principal components: smoothness and shifts in time or space. These two properties implicate almost all neuroscience data. We show how the oscillations produced by PCA, which we call "phantom oscillations," impact data analysis. We also show that traditional cross-validation does not detect phantom oscillations, so we suggest procedures that do. Our findings are supported by a collection of mathematical proofs. Collectively, our work demonstrates that patterns which emerge from high-dimensional data analysis may not faithfully represent the underlying data.
Collapse
Affiliation(s)
- Maxwell Shinn
- University College London (UCL) Queen Square Institute of Neurology, University College London, LondonWC1E 6BT, United Kingdom
| |
Collapse
|
95
|
Miles CE, McKinley SA, Ding F, Lehoucq RB. Inferring stochastic rates from heterogeneous snapshots of particle positions. ARXIV 2023:arXiv:2311.04880v1. [PMID: 37986720 PMCID: PMC10659442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Many imaging techniques for biological systems - like fixation of cells coupled with fluorescence microscopy - provide sharp spatial resolution in reporting locations of individuals at a single moment in time but also destroy the dynamics they intend to capture. These snapshot observations contain no information about individual trajectories, but still encode information about movement and demographic dynamics, especially when combined with a well-motivated biophysical model. The relationship between spatially evolving populations and single-moment representations of their collective locations is well-established with partial differential equations (PDEs) and their inverse problems. However, experimental data is commonly a set of locations whose number is insufficient to approximate a continuous-in-space PDE solution. Here, motivated by popular subcellular imaging data of gene expression, we embrace the stochastic nature of the data and investigate the mathematical foundations of parametrically inferring demographic rates from snapshots of particles undergoing birth, diffusion, and death in a nuclear or cellular domain. Toward inference, we rigorously derive a connection between individual particle paths and their presentation as a Poisson spatial process. Using this framework, we investigate the properties of the resulting inverse problem and study factors that affect quality of inference. One pervasive feature of this experimental regime is the presence of cell-to-cell heterogeneity. Rather than being a hindrance, we show that cell-to-cell geometric heterogeneity can increase the quality of inference on dynamics for certain parameter regimes. Altogether, the results serve as a basis for more detailed investigations of subcellular spatial patterns of RNA molecules and other stochastically evolving populations that can only be observed for single instants in their time evolution.
Collapse
Affiliation(s)
| | | | - Fangyuan Ding
- Department of Biomedical Engineering, University of California, Irvine
| | | |
Collapse
|
96
|
Falconnier C, Caparros-Roissard A, Decraene C, Lutz PE. Functional genomic mechanisms of opioid action and opioid use disorder: a systematic review of animal models and human studies. Mol Psychiatry 2023; 28:4568-4584. [PMID: 37723284 PMCID: PMC10914629 DOI: 10.1038/s41380-023-02238-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 08/17/2023] [Accepted: 08/24/2023] [Indexed: 09/20/2023]
Abstract
In the past two decades, over-prescription of opioids for pain management has driven a steep increase in opioid use disorder (OUD) and death by overdose, exerting a dramatic toll on western countries. OUD is a chronic relapsing disease associated with a lifetime struggle to control drug consumption, suggesting that opioids trigger long-lasting brain adaptations, notably through functional genomic and epigenomic mechanisms. Current understanding of these processes, however, remain scarce, and have not been previously reviewed systematically. To do so, the goal of the present work was to synthesize current knowledge on genome-wide transcriptomic and epigenetic mechanisms of opioid action, in primate and rodent species. Using a prospectively registered methodology, comprehensive literature searches were completed in PubMed, Embase, and Web of Science. Of the 2709 articles identified, 73 met our inclusion criteria and were considered for qualitative analysis. Focusing on the 5 most studied nervous system structures (nucleus accumbens, frontal cortex, whole striatum, dorsal striatum, spinal cord; 44 articles), we also conducted a quantitative analysis of differentially expressed genes, in an effort to identify a putative core transcriptional signature of opioids. Only one gene, Cdkn1a, was consistently identified in eleven studies, and globally, our results unveil surprisingly low consistency across published work, even when considering most recent single-cell approaches. Analysis of sources of variability detected significant contributions from species, brain structure, duration of opioid exposure, strain, time-point of analysis, and batch effects, but not type of opioid. To go beyond those limitations, we leveraged threshold-free methods to illustrate how genome-wide comparisons may generate new findings and hypotheses. Finally, we discuss current methodological development in the field, and their implication for future research and, ultimately, better care.
Collapse
Affiliation(s)
- Camille Falconnier
- Centre National de la Recherche Scientifique, Université de Strasbourg, Institut des Neurosciences Cellulaires et Intégratives UPR 3212, 67000, Strasbourg, France
| | - Alba Caparros-Roissard
- Centre National de la Recherche Scientifique, Université de Strasbourg, Institut des Neurosciences Cellulaires et Intégratives UPR 3212, 67000, Strasbourg, France
| | - Charles Decraene
- Centre National de la Recherche Scientifique, Université de Strasbourg, Institut des Neurosciences Cellulaires et Intégratives UPR 3212, 67000, Strasbourg, France
- Centre National de la Recherche Scientifique, Université de Strasbourg, Laboratoire de Neurosciences Cognitives et Adaptatives UMR 7364, 67000, Strasbourg, France
| | - Pierre-Eric Lutz
- Centre National de la Recherche Scientifique, Université de Strasbourg, Institut des Neurosciences Cellulaires et Intégratives UPR 3212, 67000, Strasbourg, France.
- Douglas Mental Health University Institute, Montreal, QC, Canada.
| |
Collapse
|
97
|
Yampolskaya M, Herriges MJ, Ikonomou L, Kotton DN, Mehta P. scTOP: physics-inspired order parameters for cellular identification and visualization. Development 2023; 150:dev201873. [PMID: 37756586 PMCID: PMC10629677 DOI: 10.1242/dev.201873] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 09/11/2023] [Indexed: 09/29/2023]
Abstract
Advances in single-cell RNA sequencing provide an unprecedented window into cellular identity. The abundance of data requires new theoretical and computational frameworks to analyze the dynamics of differentiation and integrate knowledge from cell atlases. We present 'single-cell Type Order Parameters' (scTOP): a statistical, physics-inspired approach for quantifying cell identity given a reference basis of cell types. scTOP can accurately classify cells, visualize developmental trajectories and assess the fidelity of engineered cells. Importantly, scTOP does this without feature selection, statistical fitting or dimensional reduction (e.g. uniform manifold approximation and projection, principle components analysis, etc.). We illustrate the power of scTOP using human and mouse datasets. By reanalyzing mouse lung data, we characterize a transient hybrid alveolar type 1/alveolar type 2 cell population. Visualizations of lineage tracing hematopoiesis data using scTOP confirm that a single clone can give rise to multiple mature cell types. We assess the transcriptional similarity between endogenous and donor-derived cells in the context of murine pulmonary cell transplantation. Our results suggest that physics-inspired order parameters can be an important tool for understanding differentiation and characterizing engineered cells. scTOP is available as an easy-to-use Python package.
Collapse
Affiliation(s)
| | - Michael J. Herriges
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA
- The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA
| | - Laertis Ikonomou
- Department of Oral Biology, University at Buffalo, The State University of New York, Buffalo, NY 14215, USA
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, University at Buffalo, The State University of New York, Buffalo, NY 14215, USA
| | - Darrell N. Kotton
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA
- The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA
| | - Pankaj Mehta
- Department of Physics, Boston University, Boston, MA 02215, USA
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA
- Faculty of Computing and Data Science, Boston University, Boston, MA 02215, USA
- Biological Design Center, Boston University, Boston, MA 02215, USA
| |
Collapse
|
98
|
Carbonetto P, Luo K, Sarkar A, Hung A, Tayeb K, Pott S, Stephens M. GoM DE: interpreting structure in sequence count data with differential expression analysis allowing for grades of membership. Genome Biol 2023; 24:236. [PMID: 37858253 PMCID: PMC10588049 DOI: 10.1186/s13059-023-03067-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 09/20/2023] [Indexed: 10/21/2023] Open
Abstract
Parts-based representations, such as non-negative matrix factorization and topic modeling, have been used to identify structure from single-cell sequencing data sets, in particular structure that is not as well captured by clustering or other dimensionality reduction methods. However, interpreting the individual parts remains a challenge. To address this challenge, we extend methods for differential expression analysis by allowing cells to have partial membership to multiple groups. We call this grade of membership differential expression (GoM DE). We illustrate the benefits of GoM DE for annotating topics identified in several single-cell RNA-seq and ATAC-seq data sets.
Collapse
Affiliation(s)
- Peter Carbonetto
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Research Computing Center, University of Chicago, Chicago, IL, USA
| | - Kaixuan Luo
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Abhishek Sarkar
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Vesalius Therapeutics, Cambridge, MA, USA
| | - Anthony Hung
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA
| | - Karl Tayeb
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Committee on Genetics, Genomics and Systems Biology, University of Chicago, Chicago, IL, USA
| | - Sebastian Pott
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA
| | - Matthew Stephens
- Department of Human Genetics, University of Chicago, Chicago, IL, USA.
- Department of Statistics, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
99
|
Read JF, Serralha M, Armitage JD, Iqbal MM, Cruickshank MN, Saxena A, Strickland DH, Waithman J, Holt PG, Bosco A. Single cell transcriptomics reveals cell type specific features of developmentally regulated responses to lipopolysaccharide between birth and 5 years. Front Immunol 2023; 14:1275937. [PMID: 37920467 PMCID: PMC10619903 DOI: 10.3389/fimmu.2023.1275937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 10/04/2023] [Indexed: 11/04/2023] Open
Abstract
Background Human perinatal life is characterized by a period of extraordinary change during which newborns encounter abundant environmental stimuli and exposure to potential pathogens. To meet such challenges, the neonatal immune system is equipped with unique functional characteristics that adapt to changing conditions as development progresses across the early years of life, but the molecular characteristics of such adaptations remain poorly understood. The application of single cell genomics to birth cohorts provides an opportunity to investigate changes in gene expression programs elicited downstream of innate immune activation across early life at unprecedented resolution. Methods In this study, we performed single cell RNA-sequencing of mononuclear cells collected from matched birth cord blood and 5-year peripheral blood samples following stimulation (18hrs) with two well-characterized innate stimuli; lipopolysaccharide (LPS) and Polyinosinic:polycytidylic acid (Poly(I:C)). Results We found that the transcriptional response to LPS was constrained at birth and predominantly partitioned into classical proinflammatory gene upregulation primarily by monocytes and Interferon (IFN)-signaling gene upregulation by lymphocytes. Moreover, these responses featured substantial cell-to-cell communication which appeared markedly strengthened between birth and 5 years. In contrast, stimulation with Poly(I:C) induced a robust IFN-signalling response across all cell types identified at birth and 5 years. Analysis of gene regulatory networks revealed IRF1 and STAT1 were key drivers of the LPS-induced IFN-signaling response in lymphocytes with a potential developmental role for IRF7 regulation. Conclusion Additionally, we observed distinct activation trajectory endpoints for monocytes derived from LPS-treated cord and 5-year blood, which was not apparent among Poly(I:C)-induced monocytes. Taken together, our findings provide new insight into the gene regulatory landscape of immune cell function between birth and 5 years and point to regulatory mechanisms relevant to future investigation of infection susceptibility in early life.
Collapse
Affiliation(s)
- James F. Read
- Asthma and Airway Disease Research Center, University of Arizona, Tucson, AZ, United States
- Telethon Kids Institute, The University of Western Australia, Perth, WA, Australia
| | - Michael Serralha
- Telethon Kids Institute, The University of Western Australia, Perth, WA, Australia
| | - Jesse D. Armitage
- Telethon Kids Institute, The University of Western Australia, Perth, WA, Australia
- School of Biomedical Sciences, The University of Western Australia, Nedlands, Western Australia, Australia
| | - Muhammad Munir Iqbal
- Genomics WA, Joint Initiative of Telethon Kids Institute, Harry Perkins Institute of Medical Research and The University of Western Australia, Nedlands, WA, Australia
| | - Mark N. Cruickshank
- School of Biomedical Sciences, The University of Western Australia, Nedlands, Western Australia, Australia
| | - Alka Saxena
- Genomics WA, Joint Initiative of Telethon Kids Institute, Harry Perkins Institute of Medical Research and The University of Western Australia, Nedlands, WA, Australia
| | - Deborah H. Strickland
- Telethon Kids Institute, The University of Western Australia, Perth, WA, Australia
- UWA Centre for Child Health Research, The University of Western Australia, Nedlands, WA, Australia
| | - Jason Waithman
- School of Biomedical Sciences, The University of Western Australia, Nedlands, Western Australia, Australia
| | - Patrick G. Holt
- Telethon Kids Institute, The University of Western Australia, Perth, WA, Australia
- UWA Centre for Child Health Research, The University of Western Australia, Nedlands, WA, Australia
| | - Anthony Bosco
- Asthma and Airway Disease Research Center, University of Arizona, Tucson, AZ, United States
- Department of Immunobiology, The University of Arizona College of Medicine, Tucson, AZ, United States
| |
Collapse
|
100
|
Tseng KC, Crump JG. Craniofacial developmental biology in the single-cell era. Development 2023; 150:dev202077. [PMID: 37812056 PMCID: PMC10617621 DOI: 10.1242/dev.202077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Abstract
The evolution of a unique craniofacial complex in vertebrates made possible new ways of breathing, eating, communicating and sensing the environment. The head and face develop through interactions of all three germ layers, the endoderm, ectoderm and mesoderm, as well as the so-called fourth germ layer, the cranial neural crest. Over a century of experimental embryology and genetics have revealed an incredible diversity of cell types derived from each germ layer, signaling pathways and genes that coordinate craniofacial development, and how changes to these underlie human disease and vertebrate evolution. Yet for many diseases and congenital anomalies, we have an incomplete picture of the causative genomic changes, in particular how alterations to the non-coding genome might affect craniofacial gene expression. Emerging genomics and single-cell technologies provide an opportunity to obtain a more holistic view of the genes and gene regulatory elements orchestrating craniofacial development across vertebrates. These single-cell studies generate novel hypotheses that can be experimentally validated in vivo. In this Review, we highlight recent advances in single-cell studies of diverse craniofacial structures, as well as potential pitfalls and the need for extensive in vivo validation. We discuss how these studies inform the developmental sources and regulation of head structures, bringing new insights into the etiology of structural birth anomalies that affect the vertebrate head.
Collapse
Affiliation(s)
- Kuo-Chang Tseng
- Department of Stem Cell Biology and Regenerative Medicine, Keck School of Medicine of University of Southern California, Los Angeles, CA 90033, USA
| | - J. Gage Crump
- Department of Stem Cell Biology and Regenerative Medicine, Keck School of Medicine of University of Southern California, Los Angeles, CA 90033, USA
| |
Collapse
|