1
|
Goldmann U, Wiedmer T, Garofoli A, Sedlyarov V, Bichler M, Haladik B, Wolf G, Christodoulaki E, Ingles-Prieto A, Ferrada E, Frommelt F, Teoh ST, Leippe P, Onea G, Pfeifer M, Kohlbrenner M, Chang L, Selzer P, Reinhardt J, Digles D, Ecker GF, Osthushenrich T, MacNamara A, Malarstig A, Hepworth D, Superti-Furga G. Data- and knowledge-derived functional landscape of human solute carriers. Mol Syst Biol 2025:10.1038/s44320-025-00108-2. [PMID: 40355757 DOI: 10.1038/s44320-025-00108-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2024] [Revised: 03/28/2025] [Accepted: 04/11/2025] [Indexed: 05/15/2025] Open
Abstract
The human solute carrier (SLC) superfamily of ~460 membrane transporters remains the largest understudied protein family despite its therapeutic potential. To advance SLC research, we developed a comprehensive knowledgebase that integrates systematic multi-omics data sets with selected curated information from public sources. We annotated SLC substrates through literature curation, compiled SLC disease associations using data mining techniques, and determined the subcellular localization of SLCs by combining annotations from public databases with an immunofluorescence imaging approach. This SLC-centric knowledge is made accessible to the scientific community via a web portal featuring interactive dashboards and visualization tools. Utilizing this systematically collected and curated resource, we computationally derived an integrated functional landscape for the entire human SLC superfamily. We identified clusters with distinct properties and established functional distances between transporters. Based on all available data sets and their integration, we assigned biochemical/biological functions to each SLC, making this study one of the largest systematic annotations of human gene function and a potential blueprint for future research endeavors.
Collapse
Affiliation(s)
- Ulrich Goldmann
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Tabea Wiedmer
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Andrea Garofoli
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Vitaly Sedlyarov
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Manuel Bichler
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Ben Haladik
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
- St. Anna Children's Cancer Research Institute, Vienna, Austria
| | - Gernot Wolf
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Eirini Christodoulaki
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Alvaro Ingles-Prieto
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Evandro Ferrada
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Fabian Frommelt
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Shao Thing Teoh
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Philipp Leippe
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Gabriel Onea
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
- Department of Pediatrics and Adolescent Medicine, Medical University of Vienna, Vienna, Austria
| | | | | | | | | | | | - Daniela Digles
- University of Vienna, Department of Pharmaceutical Sciences, Vienna, Austria
| | - Gerhard F Ecker
- University of Vienna, Department of Pharmaceutical Sciences, Vienna, Austria
| | | | | | | | | | - Giulio Superti-Furga
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.
- Center for Physiology and Pharmacology, Medical University of Vienna, Vienna, Austria.
- Fondazione Ri.MED, Palermo, Italy.
| |
Collapse
|
2
|
Guan A, Quek C. Single-Cell Multi-Omics: Insights into Therapeutic Innovations to Advance Treatment in Cancer. Int J Mol Sci 2025; 26:2447. [PMID: 40141092 PMCID: PMC11942442 DOI: 10.3390/ijms26062447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2025] [Revised: 03/04/2025] [Accepted: 03/07/2025] [Indexed: 03/28/2025] Open
Abstract
Advances in single-cell multi-omics technologies have deepened our understanding of cancer biology by integrating genomic, transcriptomic, epigenomic, and proteomic data at single-cell resolution. These single-cell multi-omics technologies provide unprecedented insights into tumour heterogeneity, tumour microenvironment, and mechanisms of therapeutic resistance, enabling the development of precision medicine strategies. The emerging field of single-cell multi-omics in genomic medicine has improved patient outcomes. However, most clinical applications still depend on bulk genomic approaches, which fail to directly capture the genomic variations driving cellular heterogeneity. In this review, we explore the common single-cell multi-omics platforms and discuss key analytical steps for data integration. Furthermore, we highlight emerging knowledge in therapeutic resistance and immune evasion, and the potential of new therapeutic innovations informed by single-cell multi-omics. Finally, we discuss the future directions of the application of single-cell multi-omics technologies. By bridging the gap between technological advancements and clinical implementation, this review provides a roadmap for leveraging single-cell multi-omics to improve cancer treatment and patient outcomes.
Collapse
Affiliation(s)
- Angel Guan
- Melanoma Institute Australia, The University of Sydney, Sydney, NSW 2065, Australia;
- Faculty of Medicine and Health, The University of Sydney, Sydney, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
| | - Camelia Quek
- Melanoma Institute Australia, The University of Sydney, Sydney, NSW 2065, Australia;
- Faculty of Medicine and Health, The University of Sydney, Sydney, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
3
|
Sousa RT, Paulheim H. Gene expression knowledge graph for patient representation and diabetes prediction. J Biomed Semantics 2025; 16:2. [PMID: 40057806 PMCID: PMC11889825 DOI: 10.1186/s13326-025-00325-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2024] [Accepted: 02/26/2025] [Indexed: 05/13/2025] Open
Abstract
Diabetes is a worldwide health issue affecting millions of people. Machine learning methods have shown promising results in improving diabetes prediction, particularly through the analysis of gene expression data. While gene expression data can provide valuable insights, challenges arise from the fact that the number of patients in expression datasets is usually limited, and the data from different datasets with different gene expressions cannot be easily combined. This work proposes a novel approach to address these challenges by integrating multiple gene expression datasets and domain-specific knowledge using knowledge graphs, a unique tool for biomedical data integration, and to learn uniform patient representations for subjects contained in different incompatible datasets. Different strategies and KG embedding methods are explored to generate vector representations, serving as inputs for a classifier. Extensive experiments demonstrate the efficacy of our approach, revealing weighted F1-score improvements in diabetes prediction up to 13% when integrating multiple gene expression datasets and domain-specific knowledge about protein functions and interactions.
Collapse
Affiliation(s)
- Rita T Sousa
- Data and Web Science Group, University of Mannheim, 68159, Mannheim, Germany.
| | - Heiko Paulheim
- Data and Web Science Group, University of Mannheim, 68159, Mannheim, Germany
| |
Collapse
|
4
|
Zhao B, Song K, Wei DQ, Xiong Y, Ding J. scCobra allows contrastive cell embedding learning with domain adaptation for single cell data integration and harmonization. Commun Biol 2025; 8:233. [PMID: 39948393 PMCID: PMC11825689 DOI: 10.1038/s42003-025-07692-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Accepted: 02/06/2025] [Indexed: 02/16/2025] Open
Abstract
The rapid advancement of single-cell technologies has created an urgent need for effective methods to integrate and harmonize single-cell data. Technical and biological variations across studies complicate data integration, while conventional tools often struggle with reliance on gene expression distribution assumptions and over-correction. Here, we present scCobra, a deep generative neural network designed to overcome these challenges through contrastive learning with domain adaptation. scCobra effectively mitigates batch effects, minimizes over-correction, and ensures biologically meaningful data integration without assuming specific gene expression distributions. It enables online label transfer across datasets with batch effects, allowing continuous integration of new data without retraining. Additionally, scCobra supports batch effect simulation, advanced multi-omic integration, and scalable processing of large datasets. By integrating and harmonizing datasets from similar studies, scCobra expands the available data for investigating specific biological problems, improving cross-study comparability, and revealing insights that may be obscured in isolated datasets.
Collapse
Affiliation(s)
- Bowen Zhao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- Meakins-Christie Laboratories, Department of Medicine, McGill University Health Centre, Montreal, QC, Canada
- Division of Experimental Medicine, Department of Medicine, McGill University, Montreal, QC, Canada
| | - Kailu Song
- Meakins-Christie Laboratories, Department of Medicine, McGill University Health Centre, Montreal, QC, Canada
- Quantitative Life Sciences, McGill University, Montreal, QC, Canada
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.
| | - Jun Ding
- Meakins-Christie Laboratories, Department of Medicine, McGill University Health Centre, Montreal, QC, Canada.
- Division of Experimental Medicine, Department of Medicine, McGill University, Montreal, QC, Canada.
- Quantitative Life Sciences, McGill University, Montreal, QC, Canada.
- School of Computer Science, McGill University, Montreal, QC, Canada.
- Mila-Quebec AI Institute, Montreal, QC, Canada.
| |
Collapse
|
5
|
Choi H, Kim H, Chung H, Lee DS, Kim J. Application of computational algorithms for single-cell RNA-seq and ATAC-seq in neurodegenerative diseases. Brief Funct Genomics 2025; 24:elae044. [PMID: 39500613 PMCID: PMC11735751 DOI: 10.1093/bfgp/elae044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 09/29/2024] [Accepted: 11/04/2024] [Indexed: 01/18/2025] Open
Abstract
Recent advancements in single-cell technologies, including single-cell RNA sequencing (scRNA-seq) and Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq), have greatly improved our insight into the epigenomic landscapes across various biological contexts and diseases. This paper reviews key computational tools and machine learning approaches that integrate scRNA-seq and scATAC-seq data to facilitate the alignment of transcriptomic data with chromatin accessibility profiles. Applying these integrated single-cell technologies in neurodegenerative diseases, such as Alzheimer's disease and Parkinson's disease, reveals how changes in chromatin accessibility and gene expression can illuminate pathogenic mechanisms and identify potential therapeutic targets. Despite facing challenges like data sparsity and computational demands, ongoing enhancements in scATAC-seq and scRNA-seq technologies, along with better analytical methods, continue to expand their applications. These advancements promise to revolutionize our approach to medical research and clinical diagnostics, offering a comprehensive view of cellular function and disease pathology.
Collapse
Affiliation(s)
- Hwisoo Choi
- Department of Bioinformatics, Soongsil University, 369 Sangdo-Ro, Dongjak-Gu, Seoul 06978, Republic of Korea
| | - Hyeonkyu Kim
- Department of Bioinformatics, Soongsil University, 369 Sangdo-Ro, Dongjak-Gu, Seoul 06978, Republic of Korea
| | - Hoebin Chung
- Department of Bioinformatics, Soongsil University, 369 Sangdo-Ro, Dongjak-Gu, Seoul 06978, Republic of Korea
| | - Dong-Sung Lee
- Department of Biomedical Sciences, Seoul National University Graduate School, 103 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea
- Genomic Medicine Institute, Medical Research Center, Seoul National University, 103 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea
| | - Junil Kim
- Department of Bioinformatics, Soongsil University, 369 Sangdo-Ro, Dongjak-Gu, Seoul 06978, Republic of Korea
| |
Collapse
|
6
|
Zhou Y, Tang C, Xiao X, Zhan X, Wang T, Xiao G, Xu L. Dimensionality reduction for visualizing spatially resolved profiling data using SpaSNE. Gigascience 2025; 14:giaf002. [PMID: 39960663 PMCID: PMC11831803 DOI: 10.1093/gigascience/giaf002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Revised: 11/05/2024] [Accepted: 01/06/2025] [Indexed: 02/20/2025] Open
Abstract
BACKGROUND Spatially resolved profiling technologies to quantify transcriptomes, epigenomes, and proteomes have been emerging as groundbreaking methods for comprehensive molecular characterizations. Dimensionality reduction and visualization is an essential step to analyze and interpret spatially resolved profiling data. However, state-of-the-art dimensionality reduction methods for single-cell sequencing data, such as the t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP), were not tailored for spatially resolved profiling data. RESULTS Here we developed a spatially resolved t-SNE (SpaSNE) method to integrate both spatial and molecular information. We applied it to a variety of public spatially resolved profiling datasets that were generated from 3 experimental platforms and consisted of cells from different diseases, tissues, and cell types. To compare the performances of SpaSNE, t-SNE, and UMAP, we applied them to 4 spatially resolved profiling datasets obtained from 3 distinct experimental platforms (Visium, STARmap, and MERFISH) on both diseased and normal tissues. Comparisons between SpaSNE and these state-of-the-art approaches reveal that SpaSNE achieves more accurate and meaningful visualization that better elucidates the underlying spatial and molecular data structures. CONCLUSIONS This work demonstrates the broad application of SpaSNE for reliable and robust interpretation of cell types based on both molecular and spatial information, which can set the foundation for many subsequent analysis steps, such as differential gene expression and trajectory or pseudotime analysis on the spatially resolved profiling data.
Collapse
Affiliation(s)
- Yuansheng Zhou
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Chen Tang
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Xue Xiao
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Xiaowei Zhan
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Tao Wang
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Guanghua Xiao
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Lin Xu
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Department of Pediatrics, Division of Hematology/Oncology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| |
Collapse
|
7
|
Sun Y, Kong L, Huang J, Deng H, Bian X, Li X, Cui F, Dou L, Cao C, Zou Q, Zhang Z. A comprehensive survey of dimensionality reduction and clustering methods for single-cell and spatial transcriptomics data. Brief Funct Genomics 2024; 23:733-744. [PMID: 38860675 DOI: 10.1093/bfgp/elae023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 02/29/2024] [Accepted: 05/27/2024] [Indexed: 06/12/2024] Open
Abstract
In recent years, the application of single-cell transcriptomics and spatial transcriptomics analysis techniques has become increasingly widespread. Whether dealing with single-cell transcriptomic or spatial transcriptomic data, dimensionality reduction and clustering are indispensable. Both single-cell and spatial transcriptomic data are often high-dimensional, making the analysis and visualization of such data challenging. Through dimensionality reduction, it becomes possible to visualize the data in a lower-dimensional space, allowing for the observation of relationships and differences between cell subpopulations. Clustering enables the grouping of similar cells into the same cluster, aiding in the identification of distinct cell subpopulations and revealing cellular diversity, providing guidance for downstream analyses. In this review, we systematically summarized the most widely recognized algorithms employed for the dimensionality reduction and clustering analysis of single-cell transcriptomic and spatial transcriptomic data. This endeavor provides valuable insights and ideas that can contribute to the development of novel tools in this rapidly evolving field.
Collapse
Affiliation(s)
- Yidi Sun
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Lingling Kong
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Jiayi Huang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Hongyan Deng
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Xinling Bian
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Xingfeng Li
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Lijun Dou
- Genomic Medicine Institute, Lerner Research Institute, Cleveland, OH 44106, United States
| | - Chen Cao
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 210029, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| |
Collapse
|
8
|
Zhang B, He P, Lawrence JEG, Wang S, Tuck E, Williams BA, Roberts K, Kleshchevnikov V, Mamanova L, Bolt L, Polanski K, Li T, Elmentaite R, Fasouli ES, Prete M, He X, Yayon N, Fu Y, Yang H, Liang C, Zhang H, Blain R, Chedotal A, FitzPatrick DR, Firth H, Dean A, Bayraktar OA, Marioni JC, Barker RA, Storer MA, Wold BJ, Zhang H, Teichmann SA. A human embryonic limb cell atlas resolved in space and time. Nature 2024; 635:668-678. [PMID: 38057666 PMCID: PMC7616500 DOI: 10.1038/s41586-023-06806-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 10/31/2023] [Indexed: 12/08/2023]
Abstract
Human limbs emerge during the fourth post-conception week as mesenchymal buds, which develop into fully formed limbs over the subsequent months1. This process is orchestrated by numerous temporally and spatially restricted gene expression programmes, making congenital alterations in phenotype common2. Decades of work with model organisms have defined the fundamental mechanisms underlying vertebrate limb development, but an in-depth characterization of this process in humans has yet to be performed. Here we detail human embryonic limb development across space and time using single-cell and spatial transcriptomics. We demonstrate extensive diversification of cells from a few multipotent progenitors to myriad differentiated cell states, including several novel cell populations. We uncover two waves of human muscle development, each characterized by different cell states regulated by separate gene expression programmes, and identify musculin (MSC) as a key transcriptional repressor maintaining muscle stem cell identity. Through assembly of multiple anatomically continuous spatial transcriptomic samples using VisiumStitcher, we map cells across a sagittal section of a whole fetal hindlimb. We reveal a clear anatomical segregation between genes linked to brachydactyly and polysyndactyly, and uncover transcriptionally and spatially distinct populations of the mesenchyme in the autopod. Finally, we perform single-cell RNA sequencing on mouse embryonic limbs to facilitate cross-species developmental comparison, finding substantial homology between the two species.
Collapse
Affiliation(s)
- Bao Zhang
- The Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Peng He
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - John E G Lawrence
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- Department of Trauma and Orthopaedics, Cambridge University Hospitals NHS Foundation Trust, Addenbrooke's Hospital, Cambridge, UK
| | - Shuaiyu Wang
- The Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
- Department of Obstetrics, Guangzhou Institute of Pediatrics, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, China
| | - Elizabeth Tuck
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Brian A Williams
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Kenny Roberts
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | | | - Lira Mamanova
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- Enhanc3D Genomics Ltd, Cambridge, UK
| | - Liam Bolt
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- Genomics England, London, UK
| | | | - Tong Li
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Rasa Elmentaite
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Eirini S Fasouli
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- Basic Research Center, Biomedical Research Foundation, Academy of Athens, Athens, Greece
| | - Martin Prete
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Xiaoling He
- John van Geest Centre for Brain Repair, Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK
- Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK
| | - Nadav Yayon
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Yixi Fu
- The Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Hao Yang
- The Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Chen Liang
- The Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Hui Zhang
- Institute of Human Virology, Key Laboratory of Tropical Disease Control of Ministry of Education, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Raphael Blain
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France
| | - Alain Chedotal
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France
- Institut de pathologie, groupe hospitalier Est, hospices civils de Lyon, Lyon, France
- University Claude Bernard Lyon 1, MeLiS, CNRS UMR5284, INSERM U1314, Lyon, France
| | | | - Helen Firth
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Andrew Dean
- Department of Clinical Neurosciences, Cambridge University Hospitals NHS Foundation, Cambridge, UK
| | | | - John C Marioni
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Roger A Barker
- John van Geest Centre for Brain Repair, Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK
- Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK
| | - Mekayla A Storer
- Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK
| | - Barbara J Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Hongbo Zhang
- The Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China.
- Advanced Medical Technology Center, the First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China.
- Department of Histology and Embryology, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China.
| | - Sarah A Teichmann
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
- Theory of Condensed Matter Group, Department of Physics, Cavendish Laboratory, University of Cambridge, Cambridge, UK.
| |
Collapse
|
9
|
Rong Z, Song J, Yu Y, Mi L, Qiu M, Song Y, Hou Y. Single-cell mosaic integration and cell state transfer with auto-scaling self-attention mechanism. Brief Bioinform 2024; 25:bbae540. [PMID: 39438079 PMCID: PMC11495875 DOI: 10.1093/bib/bbae540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 09/02/2024] [Accepted: 10/10/2024] [Indexed: 10/25/2024] Open
Abstract
The integration of data from multiple modalities generated by single-cell omics technologies is crucial for accurately identifying cell states. One challenge in comprehending multi-omics data resides in mosaic integration, in which different data modalities are profiled in different subsets of cells, as it requires simultaneous batch effect removal and modality alignment. Here, we develop Multi-omics Mosaic Auto-scaling Attention Variational Inference (mmAAVI), a scalable deep generative model for single-cell mosaic integration. Leveraging auto-scaling self-attention mechanisms, mmAAVI can map arbitrary combinations of omics to the common embedding space. If existing well-annotated cell states, the model can perform semisupervised learning to utilize existing these annotations. We validated the performance of mmAAVI and five other commonly used methods on four benchmark datasets, which vary in cell numbers, omics types, and missing patterns. mmAAVI consistently demonstrated its superiority. We also validated mmAAVI's ability for cell state knowledge transfer, achieving balanced accuracies of 0.82 and 0.97 with less 1% labeled cells between batches with completely different omics. The full package is available at https://github.com/luyiyun/mmAAVI.
Collapse
Affiliation(s)
- Zhiwei Rong
- Department of Biostatistics, School of Public Health, Peking University, 38 Xueyuan Rd., Haidian District, Beijing 100191, China
| | - Jiali Song
- Department of Biostatistics, School of Public Health, Peking University, 38 Xueyuan Rd., Haidian District, Beijing 100191, China
| | - Yipei Yu
- Department of Biostatistics, School of Public Health, Peking University, 38 Xueyuan Rd., Haidian District, Beijing 100191, China
| | - Lan Mi
- Peking University Cancer Hospital, 52 Fucheng Rd., Haidian District, Beijing 100142, China
| | - ManTang Qiu
- Department of Thoracic Surgery, Peking University People’s Hospital, No. 11 Xizhimen South Street, Xicheng District, Beijing 100044, China
| | - Yuqin Song
- Peking University Cancer Hospital, 52 Fucheng Rd., Haidian District, Beijing 100142, China
| | - Yan Hou
- Department of Biostatistics, School of Public Health, Peking University, 38 Xueyuan Rd., Haidian District, Beijing 100191, China
- Peking University Cancer Hospital, 52 Fucheng Rd., Haidian District, Beijing 100142, China
- Peking University Clinical Research Center, Peking University, 38 Xueyuan Rd., Haidian District, Beijing 100191, China
| |
Collapse
|
10
|
Samaran J, Peyré G, Cantini L. scConfluence: single-cell diagonal integration with regularized Inverse Optimal Transport on weakly connected features. Nat Commun 2024; 15:7762. [PMID: 39237488 PMCID: PMC11377776 DOI: 10.1038/s41467-024-51382-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 08/06/2024] [Indexed: 09/07/2024] Open
Abstract
The abundance of unpaired multimodal single-cell data has motivated a growing body of research into the development of diagonal integration methods. However, the state-of-the-art suffers from the loss of biological information due to feature conversion and struggles with modality-specific populations. To overcome these crucial limitations, we here introduce scConfluence, a method for single-cell diagonal integration. scConfluence combines uncoupled autoencoders on the complete set of features with regularized Inverse Optimal Transport on weakly connected features. We extensively benchmark scConfluence in several single-cell integration scenarios proving that it outperforms the state-of-the-art. We then demonstrate the biological relevance of scConfluence in three applications. We predict spatial patterns for Scgn, Synpr and Olah in scRNA-smFISH integration. We improve the classification of B cells and Monocytes in highly heterogeneous scRNA-scATAC-CyTOF integration. Finally, we reveal the joint contribution of Fezf2 and apical dendrite morphology in Intra Telencephalic neurons, based on morphological images and scRNA.
Collapse
Affiliation(s)
- Jules Samaran
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics Group, Paris, France
| | - Gabriel Peyré
- CNRS and DMA de l'Ecole Normale Supérieure, CNRS, Ecole Normale Supérieure, Université PSL, Paris, France
| | - Laura Cantini
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics Group, Paris, France.
| |
Collapse
|
11
|
Liao L, Martin PCN, Kim H, Panahandeh S, Won KJ. Data enhancement in the age of spatial biology. Adv Cancer Res 2024; 163:39-70. [PMID: 39271267 DOI: 10.1016/bs.acr.2024.06.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/15/2024]
Abstract
Unveiling the intricate interplay of cells in their native environment lies at the heart of understanding fundamental biological processes and unraveling disease mechanisms, particularly in complex diseases like cancer. Spatial transcriptomics (ST) offers a revolutionary lens into the spatial organization of gene expression within tissues, empowering researchers to study both cell heterogeneity and microenvironments in health and disease. However, current ST technologies often face limitations in either resolution or the number of genes profiled simultaneously. Integrating ST data with complementary sources, such as single-cell transcriptomics and detailed tissue staining images, presents a powerful solution to overcome these limitations. This review delves into the computational approaches driving the integration of spatial transcriptomics with other data types. By illuminating the key challenges and outlining the current algorithmic solutions, we aim to highlight the immense potential of these methods to revolutionize our understanding of cancer biology.
Collapse
Affiliation(s)
- Linbu Liao
- Biotech Research and Innovation Centre (BRIC), University of Copenhagen, Denmark; Samuel Oschin Cancer Center, Cedars-Sinai Medical Center, Los Angeles, CA, United States
| | - Patrick C N Martin
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, United States
| | - Hyobin Kim
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, United States
| | - Sanaz Panahandeh
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, United States
| | - Kyoung Jae Won
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, United States.
| |
Collapse
|
12
|
Verhey TB, Seo H, Gillmor A, Thoppey-Manoharan V, Schriemer D, Morrissy S. mosaicMPI: a framework for modular data integration across cohorts and -omics modalities. Nucleic Acids Res 2024; 52:e53. [PMID: 38813827 PMCID: PMC11229337 DOI: 10.1093/nar/gkae442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 04/26/2024] [Accepted: 05/10/2024] [Indexed: 05/31/2024] Open
Abstract
Advances in molecular profiling have facilitated generation of large multi-modal datasets that can potentially reveal critical axes of biological variation underlying complex diseases. Distilling biological meaning, however, requires computational strategies that can perform mosaic integration across diverse cohorts and datatypes. Here, we present mosaicMPI, a framework for discovery of low to high-resolution molecular programs representing both cell types and states, and integration within and across datasets into a network representing biological themes. Using existing datasets in glioblastoma, we demonstrate that this approach robustly integrates single cell and bulk programs across multiple platforms. Clinical and molecular annotations from cohorts are statistically propagated onto this network of programs, yielding a richly characterized landscape of biological themes. This enables deep understanding of individual tumor samples, systematic exploration of relationships between modalities, and generation of a reference map onto which new datasets can rapidly be mapped. mosaicMPI is available at https://github.com/MorrissyLab/mosaicMPI.
Collapse
Affiliation(s)
- Theodore B Verhey
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
- Charbonneau Cancer institute, University of Calgary, Calgary, Alberta, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Alberta, Canada
| | - Heewon Seo
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
- Charbonneau Cancer institute, University of Calgary, Calgary, Alberta, Canada
| | - Aaron Gillmor
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
- Charbonneau Cancer institute, University of Calgary, Calgary, Alberta, Canada
| | - Varsha Thoppey-Manoharan
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
- Charbonneau Cancer institute, University of Calgary, Calgary, Alberta, Canada
| | - David Schriemer
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
- Charbonneau Cancer institute, University of Calgary, Calgary, Alberta, Canada
| | - Sorana Morrissy
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
- Charbonneau Cancer institute, University of Calgary, Calgary, Alberta, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Alberta, Canada
| |
Collapse
|
13
|
Lotfollahi M, Yuhan Hao, Theis FJ, Satija R. The future of rapid and automated single-cell data analysis using reference mapping. Cell 2024; 187:2343-2358. [PMID: 38729109 PMCID: PMC11184658 DOI: 10.1016/j.cell.2024.03.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 03/05/2024] [Accepted: 03/08/2024] [Indexed: 05/12/2024]
Abstract
As the number of single-cell datasets continues to grow rapidly, workflows that map new data to well-curated reference atlases offer enormous promise for the biological community. In this perspective, we discuss key computational challenges and opportunities for single-cell reference-mapping algorithms. We discuss how mapping algorithms will enable the integration of diverse datasets across disease states, molecular modalities, genetic perturbations, and diverse species and will eventually replace manual and laborious unsupervised clustering pipelines.
Collapse
Affiliation(s)
- Mohammad Lotfollahi
- Institute of Computational Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Germany; Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Yuhan Hao
- Center for Genomics and Systems Biology, New York University, New York, NY, USA; New York Genome Center, New York, NY, USA
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Germany; Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK; Department of Mathematics, Technical University of Munich, Garching, Germany.
| | - Rahul Satija
- Center for Genomics and Systems Biology, New York University, New York, NY, USA; New York Genome Center, New York, NY, USA.
| |
Collapse
|
14
|
Zhou H, Hu Y, Liu S, Zhou G, Xu J, Chen A, Wang Y, Li L, Hu Y. A Precise Framework for Rice Leaf Disease Image-Text Retrieval Using FHTW-Net. PLANT PHENOMICS (WASHINGTON, D.C.) 2024; 6:0168. [PMID: 38666226 PMCID: PMC11045261 DOI: 10.34133/plantphenomics.0168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Accepted: 03/13/2024] [Indexed: 04/28/2024]
Abstract
Cross-modal retrieval for rice leaf diseases is crucial for prevention, providing agricultural experts with data-driven decision support to address disease threats and safeguard rice production. To overcome the limitations of current crop leaf disease retrieval frameworks, we focused on four common rice leaf diseases and established the first cross-modal rice leaf disease retrieval dataset (CRLDRD). We introduced cross-modal retrieval to the domain of rice leaf disease retrieval and introduced FHTW-Net, a framework for rice leaf disease image-text retrieval. To address the challenge of matching diverse image categories with complex text descriptions during the retrieval process, we initially employed ViT and BERT to extract fine-grained image and text feature sequences enriched with contextual information. Subsequently, two-way mixed self-attention (TMS) was introduced to enhance both image and text feature sequences, with the aim of uncovering important semantic information in both modalities. Then, we developed false-negative elimination-hard negative mining (FNE-HNM) strategy to facilitate in-depth exploration of semantic connections between different modalities. This strategy aids in selecting challenging negative samples for elimination to constrain the model within the triplet loss function. Finally, we introduced warm-up bat algorithm (WBA) for learning rate optimization, which improves the model's convergence speed and accuracy. Experimental results demonstrated that FHTW-Net outperforms state-of-the-art models. In image-to-text retrieval, it achieved R@1, R@5, and R@10 accuracies of 83.5%, 92%, and 94%, respectively, while in text-to-image retrieval, it achieved accuracies of 82.5%, 98%, and 98.5%, respectively. FHTW-Net offers advanced technical support and algorithmic guidance for cross-modal retrieval of rice leaf diseases.
Collapse
Affiliation(s)
- Hongliang Zhou
- College of Computer and Information Engineering,
Central South University of Forestry and Technology, Changsha 410004, Hunan, China
| | - Yufan Hu
- College of Computer and Information Engineering,
Central South University of Forestry and Technology, Changsha 410004, Hunan, China
| | - Shuai Liu
- College of Computer and Information Engineering,
Central South University of Forestry and Technology, Changsha 410004, Hunan, China
| | - Guoxiong Zhou
- College of Computer and Information Engineering,
Central South University of Forestry and Technology, Changsha 410004, Hunan, China
| | - Jiaxin Xu
- College of Computer and Information Engineering,
Central South University of Forestry and Technology, Changsha 410004, Hunan, China
| | - Aibin Chen
- College of Computer and Information Engineering,
Central South University of Forestry and Technology, Changsha 410004, Hunan, China
| | - Yanfeng Wang
- National University of Defense Technology, Changsha 410015, Hunan, China
| | - Liujun Li
- Department of Soil and Water Systems,
University of Idaho, Moscow, ID 83844, USA
| | - Yahui Hu
- Plant Protection Research Institute,
Academy of Agricultural Sciences, Changsha 410125, Hunan, China
| |
Collapse
|
15
|
Xu J, Zhou H, Hu Y, Xue Y, Zhou G, Li L, Dai W, Li J. High-Accuracy Tomato Leaf Disease Image-Text Retrieval Method Utilizing LAFANet. PLANTS (BASEL, SWITZERLAND) 2024; 13:1176. [PMID: 38732391 PMCID: PMC11085479 DOI: 10.3390/plants13091176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 04/17/2024] [Accepted: 04/20/2024] [Indexed: 05/13/2024]
Abstract
Tomato leaf disease control in the field of smart agriculture urgently requires attention and reinforcement. This paper proposes a method called LAFANet for image-text retrieval, which integrates image and text information for joint analysis of multimodal data, helping agricultural practitioners to provide more comprehensive and in-depth diagnostic evidence to ensure the quality and yield of tomatoes. First, we focus on six common tomato leaf disease images and text descriptions, creating a Tomato Leaf Disease Image-Text Retrieval Dataset (TLDITRD), introducing image-text retrieval into the field of tomato leaf disease retrieval. Then, utilizing ViT and BERT models, we extract detailed image features and sequences of textual features, incorporating contextual information from image-text pairs. To address errors in image-text retrieval caused by complex backgrounds, we propose Learnable Fusion Attention (LFA) to amplify the fusion of textual and image features, thereby extracting substantial semantic insights from both modalities. To delve further into the semantic connections across various modalities, we propose a False Negative Elimination-Adversarial Negative Selection (FNE-ANS) approach. This method aims to identify adversarial negative instances that specifically target false negatives within the triplet function, thereby imposing constraints on the model. To bolster the model's capacity for generalization and precision, we propose Adversarial Regularization (AR). This approach involves incorporating adversarial perturbations during model training, thereby fortifying its resilience and adaptability to slight variations in input data. Experimental results show that, compared with existing ultramodern models, LAFANet outperformed existing models on TLDITRD dataset, with top1, top5, and top10 reaching 83.3% and 90.0%, and top1, top5, and top10 reaching 80.3%, 93.7%, and 96.3%. LAFANet offers fresh technical backing and algorithmic insights for the retrieval of tomato leaf disease through image-text correlation.
Collapse
Affiliation(s)
- Jiaxin Xu
- College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha 410004, China; (J.X.); (H.Z.); (Y.H.); (Y.X.); (W.D.); (J.L.)
| | - Hongliang Zhou
- College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha 410004, China; (J.X.); (H.Z.); (Y.H.); (Y.X.); (W.D.); (J.L.)
| | - Yufan Hu
- College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha 410004, China; (J.X.); (H.Z.); (Y.H.); (Y.X.); (W.D.); (J.L.)
| | - Yongfei Xue
- College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha 410004, China; (J.X.); (H.Z.); (Y.H.); (Y.X.); (W.D.); (J.L.)
| | - Guoxiong Zhou
- College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha 410004, China; (J.X.); (H.Z.); (Y.H.); (Y.X.); (W.D.); (J.L.)
| | - Liujun Li
- Department of Soil and Water Systems, University of Idaho, Moscow, ID 83844, USA;
| | - Weisi Dai
- College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha 410004, China; (J.X.); (H.Z.); (Y.H.); (Y.X.); (W.D.); (J.L.)
| | - Jinyang Li
- College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha 410004, China; (J.X.); (H.Z.); (Y.H.); (Y.X.); (W.D.); (J.L.)
| |
Collapse
|
16
|
Putri GH, Howitt G, Marsh-Wakefield F, Ashhurst TM, Phipson B. SuperCellCyto: enabling efficient analysis of large scale cytometry datasets. Genome Biol 2024; 25:89. [PMID: 38589921 PMCID: PMC11003185 DOI: 10.1186/s13059-024-03229-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 03/27/2024] [Indexed: 04/10/2024] Open
Abstract
Advancements in cytometry technologies have enabled quantification of up to 50 proteins across millions of cells at single cell resolution. Analysis of cytometry data routinely involves tasks such as data integration, clustering, and dimensionality reduction. While numerous tools exist, many require extensive run times when processing large cytometry data containing millions of cells. Existing solutions, such as random subsampling, are inadequate as they risk excluding rare cell subsets. To address this, we propose SuperCellCyto, an R package that builds on the SuperCell tool which groups highly similar cells into supercells. SuperCellCyto is available on GitHub ( https://github.com/phipsonlab/SuperCellCyto ) and Zenodo ( https://doi.org/10.5281/zenodo.10521294 ).
Collapse
Affiliation(s)
- Givanna H Putri
- The Walter and Eliza Hall Institute of Medical Research and The Department of Medical Biology, The University of Melbourne, Parkville, VIC, Australia.
| | - George Howitt
- Peter MacCallum Cancer Centre and The Sir Peter MacCallum, Department of Oncology, The University of Melbourne, Parkville, VIC, Australia
| | - Felix Marsh-Wakefield
- Centenary Institute of Cancer Medicine and Cell Biology, The University of Sydney, Sydney, NSW, Australia
| | - Thomas M Ashhurst
- Sydney Cytometry Core Research Facility and School of Medical Sciences, The University of Sydney, Sydney, NSW, Australia
| | - Belinda Phipson
- The Walter and Eliza Hall Institute of Medical Research and The Department of Medical Biology, The University of Melbourne, Parkville, VIC, Australia.
| |
Collapse
|
17
|
Zheng Y, Li Y, Zhou K, Li T, VanDusen NJ, Hua Y. Precise genome-editing in human diseases: mechanisms, strategies and applications. Signal Transduct Target Ther 2024; 9:47. [PMID: 38409199 PMCID: PMC10897424 DOI: 10.1038/s41392-024-01750-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 01/15/2024] [Accepted: 01/17/2024] [Indexed: 02/28/2024] Open
Abstract
Precise genome-editing platforms are versatile tools for generating specific, site-directed DNA insertions, deletions, and substitutions. The continuous enhancement of these tools has led to a revolution in the life sciences, which promises to deliver novel therapies for genetic disease. Precise genome-editing can be traced back to the 1950s with the discovery of DNA's double-helix and, after 70 years of development, has evolved from crude in vitro applications to a wide range of sophisticated capabilities, including in vivo applications. Nonetheless, precise genome-editing faces constraints such as modest efficiency, delivery challenges, and off-target effects. In this review, we explore precise genome-editing, with a focus on introduction of the landmark events in its history, various platforms, delivery systems, and applications. First, we discuss the landmark events in the history of precise genome-editing. Second, we describe the current state of precise genome-editing strategies and explain how these techniques offer unprecedented precision and versatility for modifying the human genome. Third, we introduce the current delivery systems used to deploy precise genome-editing components through DNA, RNA, and RNPs. Finally, we summarize the current applications of precise genome-editing in labeling endogenous genes, screening genetic variants, molecular recording, generating disease models, and gene therapy, including ex vivo therapy and in vivo therapy, and discuss potential future advances.
Collapse
Affiliation(s)
- Yanjiang Zheng
- Key Laboratory of Birth Defects and Related Diseases of Women and Children of MOE, Department of Pediatrics, West China Second University Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Yifei Li
- Key Laboratory of Birth Defects and Related Diseases of Women and Children of MOE, Department of Pediatrics, West China Second University Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Kaiyu Zhou
- Key Laboratory of Birth Defects and Related Diseases of Women and Children of MOE, Department of Pediatrics, West China Second University Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Tiange Li
- Department of Cardiovascular Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Nathan J VanDusen
- Department of Pediatrics, Herman B Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN, 46202, USA.
| | - Yimin Hua
- Key Laboratory of Birth Defects and Related Diseases of Women and Children of MOE, Department of Pediatrics, West China Second University Hospital, Sichuan University, Chengdu, Sichuan, 610041, China.
| |
Collapse
|
18
|
Ghazanfar S, Guibentif C, Marioni JC. Stabilized mosaic single-cell data integration using unshared features. Nat Biotechnol 2024; 42:284-292. [PMID: 37231260 PMCID: PMC10869270 DOI: 10.1038/s41587-023-01766-z] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 03/28/2023] [Indexed: 05/27/2023]
Abstract
Currently available single-cell omics technologies capture many unique features with different biological information content. Data integration aims to place cells, captured with different technologies, onto a common embedding to facilitate downstream analytical tasks. Current horizontal data integration techniques use a set of common features, thereby ignoring non-overlapping features and losing information. Here we introduce StabMap, a mosaic data integration technique that stabilizes mapping of single-cell data by exploiting the non-overlapping features. StabMap first infers a mosaic data topology based on shared features, then projects all cells onto supervised or unsupervised reference coordinates by traversing shortest paths along the topology. We show that StabMap performs well in various simulation contexts, facilitates 'multi-hop' mosaic data integration where some datasets do not share any features and enables the use of spatial gene expression features for mapping dissociated single-cell data onto a spatial transcriptomic reference.
Collapse
Affiliation(s)
- Shila Ghazanfar
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK.
- School of Mathematics and Statistics, The University of Sydney, Camperdown, New South Wales, Australia.
- Charles Perkins Centre, The University of Sydney, Camperdown, New South Wales, Australia.
| | - Carolina Guibentif
- Sahlgrenska Center for Cancer Research, Inst. Biomedicine, Dept. Microbiology and Immunology, University of Gothenburg, Gothenburg, Sweden
| | - John C Marioni
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK.
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.
| |
Collapse
|
19
|
Souquette A, Thomas PG. Variation in the basal immune state and implications for disease. eLife 2024; 13:e90091. [PMID: 38275224 PMCID: PMC10817719 DOI: 10.7554/elife.90091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 01/21/2024] [Indexed: 01/27/2024] Open
Abstract
Analysis of pre-existing immunity and its effects on acute infection often focus on memory responses associated with a prior infectious exposure. However, memory responses occur in the context of the overall immune state and leukocytes must interact with their microenvironment and other immune cells. Thus, it is important to also consider non-antigen-specific factors which shape the composite basal state and functional capacity of the immune system, termed here as I0 ('I naught'). In this review, we discuss the determinants of I0. Utilizing influenza virus as a model, we then consider the effect of I0 on susceptibility to infection and disease severity. Lastly, we outline a mathematical framework and demonstrate how researchers can build and tailor models to specific needs. Understanding how diverse factors uniquely and collectively impact immune competence will provide valuable insights into mechanisms of immune variation, aid in screening for high-risk populations, and promote the development of broadly applicable prophylactic and therapeutic treatments.
Collapse
Affiliation(s)
- Aisha Souquette
- Department of Immunology, St. Jude Children's Research HospitalMemphisUnited States
| | - Paul G Thomas
- Department of Immunology, St. Jude Children's Research HospitalMemphisUnited States
| |
Collapse
|
20
|
Guo ZH, Wang YB, Wang S, Zhang Q, Huang DS. scCorrector: a robust method for integrating multi-study single-cell data. Brief Bioinform 2024; 25:bbad525. [PMID: 38271483 PMCID: PMC10810333 DOI: 10.1093/bib/bbad525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/12/2023] [Accepted: 12/19/2023] [Indexed: 01/27/2024] Open
Abstract
The advent of single-cell sequencing technologies has revolutionized cell biology studies. However, integrative analyses of diverse single-cell data face serious challenges, including technological noise, sample heterogeneity, and different modalities and species. To address these problems, we propose scCorrector, a variational autoencoder-based model that can integrate single-cell data from different studies and map them into a common space. Specifically, we designed a Study Specific Adaptive Normalization for each study in decoder to implement these features. scCorrector substantially achieves competitive and robust performance compared with state-of-the-art methods and brings novel insights under various circumstances (e.g. various batches, multi-omics, cross-species, and development stages). In addition, the integration of single-cell data and spatial data makes it possible to transfer information between different studies, which greatly expand the narrow range of genes covered by MERFISH technology. In summary, scCorrector can efficiently integrate multi-study single-cell datasets, thereby providing broad opportunities to tackle challenges emerging from noisy resources.
Collapse
Affiliation(s)
- Zhen-Hao Guo
- College of Electronics and Information Engineering, Tongji University, Shanghai 200000, China
| | - Yan-Bin Wang
- College of Computer Science and Technology, Zhejiang University 310027, China
| | - Siguo Wang
- Eastern Institute for Advanced Study, Eastern Institute of Technology, Tongxin Road No.568, Ningbo, Zhejiang 315201, China
| | - Qinhu Zhang
- Eastern Institute for Advanced Study, Eastern Institute of Technology, Tongxin Road No.568, Ningbo, Zhejiang 315201, China
| | - De-Shuang Huang
- Eastern Institute for Advanced Study, Eastern Institute of Technology, Tongxin Road No.568, Ningbo, Zhejiang 315201, China
| |
Collapse
|
21
|
Athaya T, Ripan RC, Li X, Hu H. Multimodal deep learning approaches for single-cell multi-omics data integration. Brief Bioinform 2023; 24:bbad313. [PMID: 37651607 PMCID: PMC10516349 DOI: 10.1093/bib/bbad313] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 06/23/2023] [Accepted: 07/18/2023] [Indexed: 09/02/2023] Open
Abstract
Integrating single-cell multi-omics data is a challenging task that has led to new insights into complex cellular systems. Various computational methods have been proposed to effectively integrate these rapidly accumulating datasets, including deep learning. However, despite the proven success of deep learning in integrating multi-omics data and its better performance over classical computational methods, there has been no systematic study of its application to single-cell multi-omics data integration. To fill this gap, we conducted a literature review to explore the use of multimodal deep learning techniques in single-cell multi-omics data integration, taking into account recent studies from multiple perspectives. Specifically, we first summarized different modalities found in single-cell multi-omics data. We then reviewed current deep learning techniques for processing multimodal data and categorized deep learning-based integration methods for single-cell multi-omics data according to data modality, deep learning architecture, fusion strategy, key tasks and downstream analysis. Finally, we provided insights into using these deep learning models to integrate multi-omics data and better understand single-cell biological mechanisms.
Collapse
Affiliation(s)
- Tasbiraha Athaya
- Department of Computer Science, University of Central Florida, Orlando, Florida, United States of America
| | - Rony Chowdhury Ripan
- Department of Computer Science, University of Central Florida, Orlando, Florida, United States of America
| | - Xiaoman Li
- Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, Florida, United States of America
| | - Haiyan Hu
- Department of Computer Science, University of Central Florida, Orlando, Florida, United States of America
| |
Collapse
|
22
|
Derbois C, Palomares MA, Deleuze JF, Cabannes E, Bonnet E. Single cell transcriptome sequencing of stimulated and frozen human peripheral blood mononuclear cells. Sci Data 2023; 10:433. [PMID: 37414801 PMCID: PMC10326076 DOI: 10.1038/s41597-023-02348-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 06/29/2023] [Indexed: 07/08/2023] Open
Abstract
Peripheral blood mononuclear cells (PBMCs) are blood cells that are a critical part of the immune system used to fight off infection, defending our bodies from harmful pathogens. In biomedical research, PBMCs are commonly used to study global immune response to disease outbreak and progression, pathogen infections, for vaccine development and a multitude of other clinical applications. Over the past few years, the revolution in single-cell RNA sequencing (scRNA-seq) has enabled an unbiased quantification of gene expression in thousands of individual cells, which provides a more efficient tool to decipher the immune system in human diseases. In this work, we generate scRNA-seq data from human PBMCs at high sequencing depth (>100,000 reads/cell) for more than 30,000 cells, in resting, stimulated, fresh and frozen conditions. The data generated can be used for benchmarking batch correction and data integration methods, and to study the effect of freezing-thawing cycles on the quality of immune cell populations and their transcriptomic profiles.
Collapse
Affiliation(s)
- Céline Derbois
- Centre National de Recherche en Génomique Humaine (CNRGH), Institut de Biologie François Jacob, CEA, Université Paris-Saclay, Evry, France
| | - Marie-Ange Palomares
- Centre National de Recherche en Génomique Humaine (CNRGH), Institut de Biologie François Jacob, CEA, Université Paris-Saclay, Evry, France
| | - Jean-François Deleuze
- Centre National de Recherche en Génomique Humaine (CNRGH), Institut de Biologie François Jacob, CEA, Université Paris-Saclay, Evry, France
| | - Eric Cabannes
- Centre National de Recherche en Génomique Humaine (CNRGH), Institut de Biologie François Jacob, CEA, Université Paris-Saclay, Evry, France
| | - Eric Bonnet
- Centre National de Recherche en Génomique Humaine (CNRGH), Institut de Biologie François Jacob, CEA, Université Paris-Saclay, Evry, France.
| |
Collapse
|
23
|
Fernández-Moya SM, Ganesh AJ, Plass M. Neural cell diversity in the light of single-cell transcriptomics. Transcription 2023; 14:158-176. [PMID: 38229529 PMCID: PMC10807474 DOI: 10.1080/21541264.2023.2295044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 10/02/2023] [Accepted: 11/10/2023] [Indexed: 01/18/2024] Open
Abstract
The development of highly parallel and affordable high-throughput single-cell transcriptomics technologies has revolutionized our understanding of brain complexity. These methods have been used to build cellular maps of the brain, its different regions, and catalog the diversity of cells in each of them during development, aging and even in disease. Now we know that cellular diversity is way beyond what was previously thought. Single-cell transcriptomics analyses have revealed that cell types previously considered homogeneous based on imaging techniques differ depending on several factors including sex, age and location within the brain. The expression profiles of these cells have also been exploited to understand which are the regulatory programs behind cellular diversity and decipher the transcriptional pathways driving them. In this review, we summarize how single-cell transcriptomics have changed our view on the cellular diversity in the human brain, and how it could impact the way we study neurodegenerative diseases. Moreover, we describe the new computational approaches that can be used to study cellular differentiation and gain insight into the functions of individual cell populations under different conditions and their alterations in disease.
Collapse
Affiliation(s)
- Sandra María Fernández-Moya
- Gene Regulation of Cell Identity, Regenerative Medicine Program, Bellvitge Institute for Biomedical Research (IDIBELL), Barcelona, L’Hospitalet del Llobregat, Spain
- Program for Advancing Clinical Translation of Regenerative Medicine of Catalonia, P- CMR[C], Barcelona, L’Hospitalet del Llobregat, Spain
| | - Akshay Jaya Ganesh
- Gene Regulation of Cell Identity, Regenerative Medicine Program, Bellvitge Institute for Biomedical Research (IDIBELL), Barcelona, L’Hospitalet del Llobregat, Spain
- Program for Advancing Clinical Translation of Regenerative Medicine of Catalonia, P- CMR[C], Barcelona, L’Hospitalet del Llobregat, Spain
| | - Mireya Plass
- Gene Regulation of Cell Identity, Regenerative Medicine Program, Bellvitge Institute for Biomedical Research (IDIBELL), Barcelona, L’Hospitalet del Llobregat, Spain
- Program for Advancing Clinical Translation of Regenerative Medicine of Catalonia, P- CMR[C], Barcelona, L’Hospitalet del Llobregat, Spain
- Center for Networked Biomedical Research on Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN), Madrid, Spain
| |
Collapse
|
24
|
Miranda AMA, Janbandhu V, Maatz H, Kanemaru K, Cranley J, Teichmann SA, Hübner N, Schneider MD, Harvey RP, Noseda M. Single-cell transcriptomics for the assessment of cardiac disease. Nat Rev Cardiol 2023; 20:289-308. [PMID: 36539452 DOI: 10.1038/s41569-022-00805-7] [Citation(s) in RCA: 45] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/03/2022] [Indexed: 12/24/2022]
Abstract
Cardiovascular disease is the leading cause of death globally. An advanced understanding of cardiovascular disease mechanisms is required to improve therapeutic strategies and patient risk stratification. State-of-the-art, large-scale, single-cell and single-nucleus transcriptomics facilitate the exploration of the cardiac cellular landscape at an unprecedented level, beyond its descriptive features, and can further our understanding of the mechanisms of disease and guide functional studies. In this Review, we provide an overview of the technical challenges in the experimental design of single-cell and single-nucleus transcriptomics studies, as well as a discussion of the type of inferences that can be made from the data derived from these studies. Furthermore, we describe novel findings derived from transcriptomics studies for each major cardiac cell type in both health and disease, and from development to adulthood. This Review also provides a guide to interpreting the exhaustive list of newly identified cardiac cell types and states, and highlights the consensus and discordances in annotation, indicating an urgent need for standardization. We describe advanced applications such as integration of single-cell data with spatial transcriptomics to map genes and cells on tissue and define cellular microenvironments that regulate homeostasis and disease progression. Finally, we discuss current and future translational and clinical implications of novel transcriptomics approaches, and provide an outlook of how these technologies will change the way we diagnose and treat heart disease.
Collapse
Affiliation(s)
| | - Vaibhao Janbandhu
- Victor Chang Cardiac Research Institute, Sydney, NSW, Australia
- School of Clinical Medicine, Faculty of Medicine, UNSW Sydney, Sydney, NSW, Australia
| | - Henrike Maatz
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Kazumasa Kanemaru
- Cellular Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - James Cranley
- Cellular Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Sarah A Teichmann
- Cellular Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- Deptartment of Physics, Cavendish Laboratory, University of Cambridge, Cambridge, UK
| | - Norbert Hübner
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
- Charite-Universitätsmedizin Berlin, Berlin, Germany
- German Center for Cardiovascular Research (DZHK), Partner Site Berlin, Berlin, Germany
| | | | - Richard P Harvey
- Victor Chang Cardiac Research Institute, Sydney, NSW, Australia
- School of Clinical Medicine, Faculty of Medicine, UNSW Sydney, Sydney, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, NSW, Australia
| | - Michela Noseda
- National Heart and Lung Institute, Imperial College London, London, UK.
| |
Collapse
|
25
|
Steyaert S, Pizurica M, Nagaraj D, Khandelwal P, Hernandez-Boussard T, Gentles AJ, Gevaert O. Multimodal data fusion for cancer biomarker discovery with deep learning. NAT MACH INTELL 2023; 5:351-362. [PMID: 37693852 PMCID: PMC10484010 DOI: 10.1038/s42256-023-00633-5] [Citation(s) in RCA: 79] [Impact Index Per Article: 39.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 02/17/2023] [Indexed: 09/12/2023]
Abstract
Technological advances now make it possible to study a patient from multiple angles with high-dimensional, high-throughput multi-scale biomedical data. In oncology, massive amounts of data are being generated ranging from molecular, histopathology, radiology to clinical records. The introduction of deep learning has significantly advanced the analysis of biomedical data. However, most approaches focus on single data modalities leading to slow progress in methods to integrate complementary data types. Development of effective multimodal fusion approaches is becoming increasingly important as a single modality might not be consistent and sufficient to capture the heterogeneity of complex diseases to tailor medical care and improve personalised medicine. Many initiatives now focus on integrating these disparate modalities to unravel the biological processes involved in multifactorial diseases such as cancer. However, many obstacles remain, including lack of usable data as well as methods for clinical validation and interpretation. Here, we cover these current challenges and reflect on opportunities through deep learning to tackle data sparsity and scarcity, multimodal interpretability, and standardisation of datasets.
Collapse
Affiliation(s)
- Sandra Steyaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University
| | - Marija Pizurica
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University
| | | | | | - Tina Hernandez-Boussard
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University
- Department of Biomedical Data Science, Stanford University
| | - Andrew J Gentles
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University
- Department of Biomedical Data Science, Stanford University
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University
- Department of Biomedical Data Science, Stanford University
| |
Collapse
|
26
|
Zheng Y, Yang X. Spatial RNA sequencing methods show high resolution of single cell in cancer metastasis and the formation of tumor microenvironment. Biosci Rep 2023; 43:BSR20221680. [PMID: 36459212 PMCID: PMC9950536 DOI: 10.1042/bsr20221680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 11/30/2022] [Accepted: 12/02/2022] [Indexed: 12/03/2022] Open
Abstract
Cancer metastasis often leads to death and therapeutic resistance. This process involves the participation of a variety of cell components, especially cellular and intercellular communications in the tumor microenvironment (TME). Using genetic sequencing technology to comprehensively characterize the tumor and TME is therefore key to understanding metastasis and therapeutic resistance. The use of spatial transcriptome sequencing enables the localization of gene expressions and cell activities in tissue sections. By examining the localization change as well as gene expression of these cells, it is possible to characterize the progress of tumor metastasis and TME formation. With improvements of this technology, spatial transcriptome sequencing technology has been extended from local regions to whole tissues, and from single sequencing technology to multimodal analysis combined with a variety of datasets. This has enabled the detection of every single cell in tissue slides, with high resolution, to provide more accurate predictive information for tumor treatments. In this review, we summarize the results of recent studies dealing with new multimodal methods and spatial transcriptome sequencing methods in tumors to illustrate recent developments in the imaging resolution of micro-tissues.
Collapse
Affiliation(s)
- Yue Zheng
- Department of Biochemistry and Molecular Biology, Basic Medical College, Shanxi Medical University, No. 56, Xinjiang South Road, Yingze street, Yingze District, Taiyuan City, Shanxi Province 030000, China
| | - Xiaofeng Yang
- Department of Urology, First Hospital of Shanxi Medical University, No. 85, Jiefang South Road, Yingze street, Yingze District, Taiyuan City, Shanxi Province 030000, China
| |
Collapse
|
27
|
Dall’Olio L, Bolognesi M, Borghesi S, Cattoretti G, Castellani G. BRAQUE: Bayesian Reduction for Amplified Quantization in UMAP Embedding. ENTROPY (BASEL, SWITZERLAND) 2023; 25:354. [PMID: 36832720 PMCID: PMC9955093 DOI: 10.3390/e25020354] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 02/01/2023] [Accepted: 02/10/2023] [Indexed: 06/09/2023]
Abstract
Single-cell biology has revolutionized the way we understand biological processes. In this paper, we provide a more tailored approach to clustering and analyzing spatial single-cell data coming from immunofluorescence imaging techniques. We propose Bayesian Reduction for Amplified Quantization in UMAP Embedding (BRAQUE) as an integrative novel approach, from data preprocessing to phenotype classification. BRAQUE starts with an innovative preprocessing, named Lognormal Shrinkage, which is able to enhance input fragmentation by fitting a lognormal mixture model and shrink each component towards its median, in order to help further the clustering step in finding more separated and clear clusters. Then, BRAQUE's pipeline consists of a dimensionality reduction step performed using UMAP, and a clustering performed using HDBSCAN on UMAP embedding. In the end, clusters are assigned to a cell type by experts, using effects size measures to rank markers and identify characterizing markers (Tier 1), and possibly characterize markers (Tier 2). The number of total cell types in one lymph node detectable with these technologies is unknown and difficult to predict or estimate. Therefore, with BRAQUE, we achieved a higher granularity than other similar algorithms such as PhenoGraph, following the idea that merging similar clusters is easier than splitting unclear ones into clear subclusters.
Collapse
Affiliation(s)
- Lorenzo Dall’Olio
- Department of Physics and Astronomy, University of Bologna, 40127 Bologna, Italy
| | - Maddalena Bolognesi
- Department of Medicine and Surgery, University of Milano Bicocca, 20900 Monza, Italy
| | - Simone Borghesi
- Department of Mathematics and Applications, University of Milano Bicocca, 20126 Milan, Italy
| | - Giorgio Cattoretti
- Department of Medicine and Surgery, University of Milano Bicocca, 20900 Monza, Italy
| | - Gastone Castellani
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, 40127 Bologna, Italy
| |
Collapse
|
28
|
Zhang Z, Sun H, Mariappan R, Chen X, Chen X, Jain MS, Efremova M, Teichmann SA, Rajan V, Zhang X. scMoMaT jointly performs single cell mosaic integration and multi-modal bio-marker detection. Nat Commun 2023; 14:384. [PMID: 36693837 PMCID: PMC9873790 DOI: 10.1038/s41467-023-36066-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Accepted: 01/13/2023] [Indexed: 01/26/2023] Open
Abstract
Single cell data integration methods aim to integrate cells across data batches and modalities, and data integration tasks can be categorized into horizontal, vertical, diagonal, and mosaic integration, where mosaic integration is the most general and challenging case with few methods developed. We propose scMoMaT, a method that is able to integrate single cell multi-omics data under the mosaic integration scenario using matrix tri-factorization. During integration, scMoMaT is also able to uncover the cluster specific bio-markers across modalities. These multi-modal bio-markers are used to interpret and annotate the clusters to cell types. Moreover, scMoMaT can integrate cell batches with unequal cell type compositions. Applying scMoMaT to multiple real and simulated datasets demonstrated these features of scMoMaT and showed that scMoMaT has superior performance compared to existing methods. Specifically, we show that integrated cell embedding combined with learned bio-markers lead to cell type annotations of higher quality or resolution compared to their original annotations.
Collapse
Affiliation(s)
- Ziqi Zhang
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Haoran Sun
- School of Mathematics, Georgia Institute of Technology, Atlanta, GA, USA
| | - Ragunathan Mariappan
- Department of Information Systems and Analytics, National University of Singapore, Singapore, Singapore
| | - Xi Chen
- Department of Biology, Southern University of Science and Technology, Shenzhen, Guangdong, China
| | - Xinyu Chen
- Bioengineering Program, Georgia Institute of Technology, Atlanta, GA, USA
| | | | | | | | - Vaibhav Rajan
- Department of Information Systems and Analytics, National University of Singapore, Singapore, Singapore
| | - Xiuwei Zhang
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA.
| |
Collapse
|
29
|
Park J, Kim J, Lewy T, Rice CM, Elemento O, Rendeiro AF, Mason CE. Spatial omics technologies at multimodal and single cell/subcellular level. Genome Biol 2022; 23:256. [PMID: 36514162 PMCID: PMC9746133 DOI: 10.1186/s13059-022-02824-6] [Citation(s) in RCA: 63] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 11/29/2022] [Indexed: 12/15/2022] Open
Abstract
Spatial omics technologies enable a deeper understanding of cellular organizations and interactions within a tissue of interest. These assays can identify specific compartments or regions in a tissue with differential transcript or protein abundance, delineate their interactions, and complement other methods in defining cellular phenotypes. A variety of spatial methodologies are being developed and commercialized; however, these techniques differ in spatial resolution, multiplexing capability, scale/throughput, and coverage. Here, we review the current and prospective landscape of single cell to subcellular resolution spatial omics technologies and analysis tools to provide a comprehensive picture for both research and clinical applications.
Collapse
Affiliation(s)
- Jiwoon Park
- Department of Physiology, Biophysics and Systems Biology, Weill Cornell Medicine, New York, NY, USA
- Laboratory of Virology and Infectious Disease, The Rockefeller University, New York, NY, 10065, USA
| | - Junbum Kim
- Department of Physiology, Biophysics and Systems Biology, Weill Cornell Medicine, New York, NY, USA
| | - Tyler Lewy
- Laboratory of Virology and Infectious Disease, The Rockefeller University, New York, NY, 10065, USA
| | - Charles M Rice
- Laboratory of Virology and Infectious Disease, The Rockefeller University, New York, NY, 10065, USA
| | - Olivier Elemento
- Department of Physiology, Biophysics and Systems Biology, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
- Caryl and Israel Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA
| | - André F Rendeiro
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
- Caryl and Israel Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Christopher E Mason
- Department of Physiology, Biophysics and Systems Biology, Weill Cornell Medicine, New York, NY, USA.
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA.
- The Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA.
- The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
30
|
Cao K, Gong Q, Hong Y, Wan L. A unified computational framework for single-cell data integration with optimal transport. Nat Commun 2022; 13:7419. [PMID: 36456571 PMCID: PMC9715710 DOI: 10.1038/s41467-022-35094-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 11/18/2022] [Indexed: 12/05/2022] Open
Abstract
Single-cell data integration can provide a comprehensive molecular view of cells. However, how to integrate heterogeneous single-cell multi-omics as well as spatially resolved transcriptomic data remains a major challenge. Here we introduce uniPort, a unified single-cell data integration framework that combines a coupled variational autoencoder (coupled-VAE) and minibatch unbalanced optimal transport (Minibatch-UOT). It leverages both highly variable common and dataset-specific genes for integration to handle the heterogeneity across datasets, and it is scalable to large-scale datasets. uniPort jointly embeds heterogeneous single-cell multi-omics datasets into a shared latent space. It can further construct a reference atlas for gene imputation across datasets. Meanwhile, uniPort provides a flexible label transfer framework to deconvolute heterogeneous spatial transcriptomic data using an optimal transport plan, instead of embedding latent space. We demonstrate the capability of uniPort by applying it to integrate a variety of datasets, including single-cell transcriptomics, chromatin accessibility, and spatially resolved transcriptomic data.
Collapse
Affiliation(s)
- Kai Cao
- grid.484479.2LSC, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China ,grid.410726.60000 0004 1797 8419School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Qiyu Gong
- grid.16821.3c0000 0004 0368 8293Shanghai Institute of Immunology, Faculty of Basic Medicine, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yiguang Hong
- grid.24516.340000000123704535Department of Control Science and Engineering, Tongji University, Shanghai, China
| | - Lin Wan
- grid.484479.2LSC, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China ,grid.410726.60000 0004 1797 8419School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
31
|
Xu Y, Begoli E, McCord RP. sciCAN: single-cell chromatin accessibility and gene expression data integration via cycle-consistent adversarial network. NPJ Syst Biol Appl 2022; 8:33. [PMID: 36089620 PMCID: PMC9464763 DOI: 10.1038/s41540-022-00245-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 09/01/2022] [Indexed: 11/09/2022] Open
Abstract
The boom in single-cell technologies has brought a surge of high dimensional data that come from different sources and represent cellular systems from different views. With advances in these single-cell technologies, integrating single-cell data across modalities arises as a new computational challenge. Here, we present an adversarial approach, sciCAN, to integrate single-cell chromatin accessibility and gene expression data in an unsupervised manner. We benchmarked sciCAN with 5 existing methods in 5 scATAC-seq/scRNA-seq datasets, and we demonstrated that our method dealt with data integration with consistent performance across datasets and better balance of mutual transferring between modalities than the other 5 existing methods. We further applied sciCAN to 10X Multiome data and confirmed that the integrated representation preserves biological relationships within the hematopoietic hierarchy. Finally, we investigated CRISPR-perturbed single-cell K562 ATAC-seq and RNA-seq data to identify cells with related responses to different perturbations in these different modalities.
Collapse
Affiliation(s)
- Yang Xu
- grid.411461.70000 0001 2315 1184UT-ORNL Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN USA
| | - Edmon Begoli
- grid.135519.a0000 0004 0446 2659Oak Ridge National Laboratory, Oak Ridge, TN USA ,grid.411461.70000 0001 2315 1184Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN USA
| | - Rachel Patton McCord
- Biochemistry & Cellular and Molecular Biology Department, University of Tennessee, Knoxville, TN, USA.
| |
Collapse
|
32
|
Zhang Z, Yang C, Zhang X. scDART: integrating unmatched scRNA-seq and scATAC-seq data and learning cross-modality relationship simultaneously. Genome Biol 2022; 23:139. [PMID: 35761403 PMCID: PMC9238247 DOI: 10.1186/s13059-022-02706-x] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 06/14/2022] [Indexed: 12/14/2022] Open
Abstract
It is a challenging task to integrate scRNA-seq and scATAC-seq data obtained from different batches. Existing methods tend to use a pre-defined gene activity matrix to convert the scATAC-seq data into scRNA-seq data. The pre-defined gene activity matrix is often of low quality and does not reflect the dataset-specific relationship between the two data modalities. We propose scDART, a deep learning framework that integrates scRNA-seq and scATAC-seq data and learns cross-modalities relationships simultaneously. Specifically, the design of scDART allows it to preserve cell trajectories in continuous cell populations and can be applied to trajectory inference on integrated data.
Collapse
Affiliation(s)
- Ziqi Zhang
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, 30308 GA USA
| | - Chengkai Yang
- Department of Electrical Engineering and Information Systems, Graduate School of Engineering, The University of Tokyo, Tokyo, Japan
| | - Xiuwei Zhang
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, 30308 GA USA
| |
Collapse
|
33
|
Xu Y, McCord RP. Diagonal integration of multimodal single-cell data: potential pitfalls and paths forward. Nat Commun 2022; 13:3505. [PMID: 35717437 PMCID: PMC9206644 DOI: 10.1038/s41467-022-31104-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 06/06/2022] [Indexed: 11/09/2022] Open
Affiliation(s)
- Yang Xu
- grid.411461.70000 0001 2315 1184UT-ORNL Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37996 USA
| | - Rachel Patton McCord
- Department of Biochemistry & Cellular and Molecular Biology, University of Tennessee, 309 Ken and Blaire Mossman Bldg 1311 Cumberland Ave, Knoxville, TN, 37996, USA.
| |
Collapse
|
34
|
UINMF performs mosaic integration of single-cell multi-omic datasets using nonnegative matrix factorization. Nat Commun 2022; 13:780. [PMID: 35140223 PMCID: PMC8828882 DOI: 10.1038/s41467-022-28431-4] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 01/21/2022] [Indexed: 12/21/2022] Open
Abstract
Single-cell genomic technologies provide an unprecedented opportunity to define molecular cell types in a data-driven fashion, but present unique data integration challenges. Many analyses require “mosaic integration”, including both features shared across datasets and features exclusive to a single experiment. Previous computational integration approaches require that the input matrices share the same number of either genes or cells, and thus can use only shared features. To address this limitation, we derive a nonnegative matrix factorization algorithm for integrating single-cell datasets containing both shared and unshared features. The key advance is incorporating an additional metagene matrix that allows unshared features to inform the factorization. We demonstrate that incorporating unshared features significantly improves integration of single-cell RNA-seq, spatial transcriptomic, SNARE-seq, and cross-species datasets. We have incorporated the UINMF algorithm into the open-source LIGER R package (https://github.com/welch-lab/liger). Single-cell genomic technologies present unique data integration challenges. Here the authors introduce an integrative nonnegative matrix factorization algorithm that incorporates features unshared between datasets when performing dataset integrations, improving integration results for spatial transcriptomic, cross-modality, and cross-species data.
Collapse
|
35
|
Abstract
Motivation The advent of multi-modal single-cell sequencing techniques have shed new light on molecular mechanisms by simultaneously inspecting transcriptomes, epigenomes and proteomes of the same cell. However, to date, the existing computational approaches for integration of multimodal single-cell data are either computationally expensive, require the delineation of parameters or can only be applied to particular modalities. Results Here we present a single-cell multi-modal integration method, named Multi-mOdal Joint IntegraTion of cOmpOnents (MOJITOO). MOJITOO uses canonical correlation analysis for a fast and parameter free detection of a shared representation of cells from multimodal single-cell data. Moreover, estimated canonical components can be used for interpretation, i.e. association of modality-specific molecular features with the latent space. We evaluate MOJITOO using bi- and tri-modal single-cell datasets and show that MOJITOO outperforms existing methods regarding computational requirements, preservation of original latent spaces and clustering. Availability and implementation The software, code and data for benchmarking are available at https://github.com/CostaLab/MOJITOO and https://doi.org/10.5281/zenodo.6348128. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mingbo Cheng
- Institute for Computational Genomics, Joint Research Center for Computational Biomedicine, RWTH Aachen University Medical School, 52074 Aachen, Germany
| | - Zhijian Li
- Institute for Computational Genomics, Joint Research Center for Computational Biomedicine, RWTH Aachen University Medical School, 52074 Aachen, Germany
| | - Ivan G Costa
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|