Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Cao Y, Yang P, Yang JYH. A benchmark study of simulation methods for single-cell RNA sequencing data. Nat Commun 2021;12:6911. [PMID: 34824223 PMCID: PMC8617278 DOI: 10.1038/s41467-021-27130-w] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 10/26/2021] [Indexed: 11/09/2022] Open

For:	Cao Y, Yang P, Yang JYH. A benchmark study of simulation methods for single-cell RNA sequencing data. Nat Commun 2021;12:6911. [PMID: 34824223 PMCID: PMC8617278 DOI: 10.1038/s41467-021-27130-w] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 10/26/2021] [Indexed: 11/09/2022] Open

Number

Cited by Other Article(s)

Asloudj Y, Mougin F, Thébault P. scEVE: a single-cell RNA-seq ensemble clustering algorithm capitalizing on the differences of predictions between multiple clustering methods. NAR Genom Bioinform 2025;7:lqaf073. [PMID: 40491972 PMCID: PMC12147100 DOI: 10.1093/nargab/lqaf073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Revised: 05/07/2025] [Accepted: 05/20/2025] [Indexed: 06/11/2025] Open

Pavel A, Grønberg MG, Clemmensen LH. The impact of dropouts in scRNAseq dense neighborhood analysis. Comput Struct Biotechnol J 2025;27:1278-1285. [PMID: 40225837 PMCID: PMC11992407 DOI: 10.1016/j.csbj.2025.03.033] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2024] [Revised: 03/19/2025] [Accepted: 03/20/2025] [Indexed: 04/15/2025] Open

Abstract

Single cell RNA sequencing (scRNAseq) provides the possibility to investigate transcriptomic profiles on a single cell level. However, the data show unique challenges in comparison to bulk transcriptomic data, one being high dropout rates, which yields high sparsity data. Many classical analysis and preprocessing pipelines are based on the assumption that poor data can be counteracted by quantity and that similar cells (samples) are close to each other in space. Clustering is commonly used to detect clusters (dense local cell neighborhoods) under the assumption that similar cells are close to each other in space (where close is dependent on the (distance) metric used). The most commonly used clustering methodologies to detect dense local neighborhoods are based on graph clustering on a nearest neighbor graph. However, high dropout rates may break this assumption and make it difficult to reliably detect such dense local neighborhoods. We assess the cluster homogeneity and stability under increasing degrees of dropouts in one of the most popular clustering pipelines (dimensionality reduction + graph based clustering), as provided by scRNAseq analyses packages Seurat and Scanpy. Our study showcases that while the default pipeline performs well in terms of cluster homogeneity (i.e., cells in a cluster are of the same type), also with increasing dropout rates, the stability of clusters (i.e., cell pairs consistently being in the same cluster) decreases. This implies that sub-populations within cell types are increasingly difficult to identify under increasing dropout rates because observations are not consistently close. Our results challenge the current practice of using default clustering pipelines and the general assumption of identifiable local neighborhoods on high dropout data. Hence, these results suggest that careful consideration in interpretation and downstream analysis need to be made when relying on local neighborhoods and clusters on scRNAseq data. In addition, these results call for extensive benchmarking, to identify and provide methods robust in their local neighborhood relationships on data containing low to high dropout rates.

Collapse

Liang X, Torkel M, Cao Y, Yang JYH. Multi-task benchmarking of spatially resolved gene expression simulation models. Genome Biol 2025;26:57. [PMID: 40098171 PMCID: PMC11912772 DOI: 10.1186/s13059-025-03505-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Accepted: 02/12/2025] [Indexed: 03/19/2025] Open

Ge S, Sun S, Xu H, Cheng Q, Ren Z. Deep learning in single-cell and spatial transcriptomics data analysis: advances and challenges from a data science perspective. Brief Bioinform 2025;26:bbaf136. [PMID: 40185158 PMCID: PMC11970898 DOI: 10.1093/bib/bbaf136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2024] [Revised: 02/17/2025] [Accepted: 03/05/2025] [Indexed: 04/07/2025] Open

Abstract

The development of single-cell and spatial transcriptomics has revolutionized our capacity to investigate cellular properties, functions, and interactions in both cellular and spatial contexts. Despite this progress, the analysis of single-cell and spatial omics data remains challenging. First, single-cell sequencing data are high-dimensional and sparse, and are often contaminated by noise and uncertainty, obscuring the underlying biological signal. Second, these data often encompass multiple modalities, including gene expression, epigenetic modifications, metabolite levels, and spatial locations. Integrating these diverse data modalities is crucial for enhancing prediction accuracy and biological interpretability. Third, while the scale of single-cell sequencing has expanded to millions of cells, high-quality annotated datasets are still limited. Fourth, the complex correlations of biological tissues make it difficult to accurately reconstruct cellular states and spatial contexts. Traditional feature engineering approaches struggle with the complexity of biological networks, while deep learning, with its ability to handle high-dimensional data and automatically identify meaningful patterns, has shown great promise in overcoming these challenges. Besides systematically reviewing the strengths and weaknesses of advanced deep learning methods, we have curated 21 datasets from nine benchmarks to evaluate the performance of 58 computational methods. Our analysis reveals that model performance can vary significantly across different benchmark datasets and evaluation metrics, providing a useful perspective for selecting the most appropriate approach based on a specific application scenario. We highlight three key areas for future development, offering valuable insights into how deep learning can be effectively applied to transcriptomic data analysis in biological, medical, and clinical settings.

Collapse

Monzó C, Aguerralde-Martin M, Martínez-Mira C, Arzalluz-Luque Á, Conesa A, Tarazona S. MOSim: bulk and single-cell multilayer regulatory network simulator. Brief Bioinform 2025;26:bbaf110. [PMID: 40116657 PMCID: PMC11926980 DOI: 10.1093/bib/bbaf110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 02/13/2025] [Accepted: 02/21/2025] [Indexed: 03/23/2025] Open

Liu S, Corcoran D, Garcia-Recio S, Marron JS, Perou C. Crafted experiments to evaluate feature selection methods for single-cell RNA-seq data. NAR Genom Bioinform 2025;7:lqaf023. [PMID: 40109353 PMCID: PMC11920870 DOI: 10.1093/nargab/lqaf023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2024] [Revised: 01/17/2025] [Accepted: 02/24/2025] [Indexed: 03/22/2025] Open

Pouyabahar D, Andrews T, Bader GD. Interpretable single-cell factor decomposition using sciRED. Nat Commun 2025;16:1878. [PMID: 39987196 PMCID: PMC11846867 DOI: 10.1038/s41467-025-57157-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Accepted: 02/10/2025] [Indexed: 02/24/2025] Open

Zhao B, Song K, Wei DQ, Xiong Y, Ding J. scCobra allows contrastive cell embedding learning with domain adaptation for single cell data integration and harmonization. Commun Biol 2025;8:233. [PMID: 39948393 PMCID: PMC11825689 DOI: 10.1038/s42003-025-07692-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Accepted: 02/06/2025] [Indexed: 02/16/2025] Open

Brombacher E, Schilling O, Kreutz C. Characterizing the omics landscape based on 10,000+ datasets. Sci Rep 2025;15:3189. [PMID: 39863642 PMCID: PMC11762699 DOI: 10.1038/s41598-025-87256-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Accepted: 01/17/2025] [Indexed: 01/27/2025] Open

CZI Cell Science Program, Abdulla S, Aevermann B, Assis P, Badajoz S, Bell SM, Bezzi E, Cakir B, Chaffer J, Chambers S, Cherry J, Chi T, Chien J, Dorman L, Garcia-Nieto P, Gloria N, Hastie M, Hegeman D, Hilton J, Huang T, Infeld A, Istrate AM, Jelic I, Katsuya K, Kim YJ, Liang K, Lin M, Lombardo M, Marshall B, Martin B, McDade F, Megill C, Patel N, Predeus A, Raymor B, Robatmili B, Rogers D, Rutherford E, Sadgat D, Shin A, Small C, Smith T, Sridharan P, Tarashansky A, Tavares N, Thomas H, Tolopko A, Urisko M, Yan J, Yeretssian G, Zamanian J, Mani A, Cool J, Carr A. CZ CELLxGENE Discover: a single-cell data platform for scalable exploration, analysis and modeling of aggregated data. Nucleic Acids Res 2025;53:D886-D900. [PMID: 39607691 PMCID: PMC11701654 DOI: 10.1093/nar/gkae1142] [Citation(s) in RCA: 36] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2024] [Revised: 10/28/2024] [Accepted: 11/01/2024] [Indexed: 11/29/2024] Open

Affiliation(s)

CZI Cell Science Program
Shibla Abdulla Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
Brian Aevermann Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Pedro Assis Department of Genetics, Stanford University School of Medicine, 291 Campus Drive, Li Ka Shing Building, Stanford, CA 94305, USA
Seve Badajoz Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Sidney M Bell Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Emanuele Bezzi Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Batuhan Cakir Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
Jim Chaffer Department of Genetics, Stanford University School of Medicine, 291 Campus Drive, Li Ka Shing Building, Stanford, CA 94305, USA
Signe Chambers Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
J Michael Cherry Department of Genetics, Stanford University School of Medicine, 291 Campus Drive, Li Ka Shing Building, Stanford, CA 94305, USA
Tiffany Chi Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Jennifer Chien Department of Genetics, Stanford University School of Medicine, 291 Campus Drive, Li Ka Shing Building, Stanford, CA 94305, USA
Leah Dorman Chan Zuckerberg, Biohub, SF, 499 Illinois St, San Francisco, CA 94158, USA
Pablo Garcia-Nieto Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Nayib Gloria Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Mim Hastie Clever Canary, 850 Front St. #1491, Santa Cruz, CA, USA
Daniel Hegeman Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Jason Hilton Department of Genetics, Stanford University School of Medicine, 291 Campus Drive, Li Ka Shing Building, Stanford, CA 94305, USA
Timmy Huang Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Amanda Infeld Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Ana-Maria Istrate Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Ivana Jelic Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Kuni Katsuya Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Yang Joon Kim Chan Zuckerberg, Biohub, SF, 499 Illinois St, San Francisco, CA 94158, USA
Karen Liang Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Mike Lin Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Maximilian Lombardo Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Bailey Marshall Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Bruce Martin Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Fran McDade Clever Canary, 850 Front St. #1491, Santa Cruz, CA, USA
Colin Megill Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Nikhil Patel Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Alexander Predeus Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
Brian Raymor Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Behnam Robatmili Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Dave Rogers Clever Canary, 850 Front St. #1491, Santa Cruz, CA, USA
Erica Rutherford Department of Genetics, Stanford University School of Medicine, 291 Campus Drive, Li Ka Shing Building, Stanford, CA 94305, USA
Dana Sadgat Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Andrew Shin Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Corinn Small Department of Genetics, Stanford University School of Medicine, 291 Campus Drive, Li Ka Shing Building, Stanford, CA 94305, USA
Trent Smith Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Prathap Sridharan Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Alexander Tarashansky Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Norbert Tavares Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Harley Thomas Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Andrew Tolopko Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Meghan Urisko Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Joyce Yan Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Garabet Yeretssian Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Jennifer Zamanian Department of Genetics, Stanford University School of Medicine, 291 Campus Drive, Li Ka Shing Building, Stanford, CA 94305, USA
Arathi Mani Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Jonah Cool Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA
Ambrose Carr Chan Zuckerberg Initiative, 1180 Main Street, Redwood City, CA 94063, USA

Collapse

Pouyabahar D, Andrews T, Bader GD. Interpretable single-cell factor decomposition using sciRED. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.01.605536. [PMID: 39149356 PMCID: PMC11326131 DOI: 10.1101/2024.08.01.605536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]

Shan X, Zhao H. Inferring Cell-Type-Specific Co-Expressed Genes from Single Cell Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.08.622700. [PMID: 39605403 PMCID: PMC11601408 DOI: 10.1101/2024.11.08.622700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]

Cui S, Nassiri S, Zakeri I. Mcadet: A feature selection method for fine-resolution single-cell RNA-seq data based on multiple correspondence analysis and community detection. PLoS Comput Biol 2024;20:e1012560. [PMID: 39466833 PMCID: PMC11542852 DOI: 10.1371/journal.pcbi.1012560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 11/07/2024] [Accepted: 10/15/2024] [Indexed: 10/30/2024] Open

Zhang J, Larschan E, Bigness J, Singh R. scNODE : generative model for temporal single cell transcriptomic data prediction. Bioinformatics 2024;40:ii146-ii154. [PMID: 39230694 PMCID: PMC11373355 DOI: 10.1093/bioinformatics/btae393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024] Open

Garbulowski M, Hillerton T, Morgan D, Seçilmiş D, Sonnhammer L, Tjärnberg A, Nordling TEM, Sonnhammer ELL. GeneSPIDER2: large scale GRN simulation and benchmarking with perturbed single-cell data. NAR Genom Bioinform 2024;6:lqae121. [PMID: 39296931 PMCID: PMC11409065 DOI: 10.1093/nargab/lqae121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 08/20/2024] [Accepted: 09/02/2024] [Indexed: 09/21/2024] Open

Pouyabahar D, Andrews T, Bader GD. Interpretable single-cell factor decomposition using sciRED. RESEARCH SQUARE 2024:rs.3.rs-4819117. [PMID: 39149508 PMCID: PMC11326389 DOI: 10.21203/rs.3.rs-4819117/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]

Singh A, Khiabanian H. Feature selection followed by a novel residuals-based normalization that includes variance stabilization simplifies and improves single-cell gene expression analysis. BMC Bioinformatics 2024;25:248. [PMID: 39080559 PMCID: PMC11290295 DOI: 10.1186/s12859-024-05872-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Accepted: 07/16/2024] [Indexed: 08/02/2024] Open

Han G, Yan D, Sun Z, Fang J, Chang X, Wilson L, Liu Y. Bayesian-frequentist hybrid inference framework for single cell RNA-seq analyses. Hum Genomics 2024;18:69. [PMID: 38902839 PMCID: PMC11575015 DOI: 10.1186/s40246-024-00638-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 06/12/2024] [Indexed: 06/22/2024] Open

Duo H, Li Y, Lan Y, Tao J, Yang Q, Xiao Y, Sun J, Li L, Nie X, Zhang X, Liang G, Liu M, Hao Y, Li B. Systematic evaluation with practical guidelines for single-cell and spatially resolved transcriptomics data simulation under multiple scenarios. Genome Biol 2024;25:145. [PMID: 38831386 PMCID: PMC11149245 DOI: 10.1186/s13059-024-03290-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 05/28/2024] [Indexed: 06/05/2024] Open

Abstract

BACKGROUND

Single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) have led to groundbreaking advancements in life sciences. To develop bioinformatics tools for scRNA-seq and SRT data and perform unbiased benchmarks, data simulation has been widely adopted by providing explicit ground truth and generating customized datasets. However, the performance of simulation methods under multiple scenarios has not been comprehensively assessed, making it challenging to choose suitable methods without practical guidelines.

RESULTS

We systematically evaluated 49 simulation methods developed for scRNA-seq and/or SRT data in terms of accuracy, functionality, scalability, and usability using 152 reference datasets derived from 24 platforms. SRTsim, scDesign3, ZINB-WaVE, and scDesign2 have the best accuracy performance across various platforms. Unexpectedly, some methods tailored to scRNA-seq data have potential compatibility for simulating SRT data. Lun, SPARSim, and scDesign3-tree outperform other methods under corresponding simulation scenarios. Phenopath, Lun, Simple, and MFA yield high scalability scores but they cannot generate realistic simulated data. Users should consider the trade-offs between method accuracy and scalability (or functionality) when making decisions. Additionally, execution errors are mainly caused by failed parameter estimations and appearance of missing or infinite values in calculations. We provide practical guidelines for method selection, a standard pipeline Simpipe ( https://github.com/duohongrui/simpipe ; https://doi.org/10.5281/zenodo.11178409 ), and an online tool Simsite ( https://www.ciblab.net/software/simshiny/ ) for data simulation.

CONCLUSIONS

No method performs best on all criteria, thus a good-yet-not-the-best method is recommended if it solves problems effectively and reasonably. Our comprehensive work provides crucial insights for developers on modeling gene expression data and fosters the simulation process for users.

Collapse

Affiliation(s)

Hongrui Duo College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
Yinghong Li Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, People's Republic of China
Yang Lan Institute of Pathology and Southwest Cancer Center, Southwest Hospital, Army Medical University, Chongqing, 400038, People's Republic of China
Jingxin Tao College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
Qingxia Yang Zhejiang Provincial Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, People's Republic of China
Yingxue Xiao College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
Jing Sun College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
Lei Li College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
Xiner Nie Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, People's Republic of China
Xiaoxi Zhang College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
Guizhao Liang Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, People's Republic of China
Mingwei Liu Key Laboratory of Clinical Laboratory Diagnostics, College of Laboratory Medicine, Chongqing Medical University, Chongqing, 400016, People's Republic of China
Youjin Hao College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China.
Bo Li College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China.

Collapse

Brooks TG, Lahens NF, Mrčela A, Grant GR. Challenges and best practices in omics benchmarking. Nat Rev Genet 2024;25:326-339. [PMID: 38216661 DOI: 10.1038/s41576-023-00679-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/14/2023] [Indexed: 01/14/2024]

Liu F, Yang Y, Xu XS, Yuan M. MESBC: A novel mutually exclusive spectral biclustering method for cancer subtyping. Comput Biol Chem 2024;109:108009. [PMID: 38219419 DOI: 10.1016/j.compbiolchem.2023.108009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 12/22/2023] [Accepted: 12/24/2023] [Indexed: 01/16/2024]

Abstract

Many soft biclustering algorithms have been developed and applied to various biological and biomedical data analyses. However, few mutually exclusive (hard) biclustering algorithms have been proposed, which could better identify disease or molecular subtypes with survival significance based on genomic or transcriptomic data. In this study, we developed a novel mutually exclusive spectral biclustering (MESBC) algorithm based on spectral method to detect mutually exclusive biclusters. MESBC simultaneously detects relevant features (genes) and corresponding conditions (patients) subgroups and, therefore, automatically uses the signature features for each subtype to perform the clustering. Extensive simulations revealed that MESBC provided superior accuracy in detecting pre-specified biclusters compared with the non-negative matrix factorization (NMF) and Dhillon's algorithm, particularly in very noisy data. Further analysis of the algorithm on real datasets obtained from the TCGA database showed that MESBC provided more accurate (i.e., smaller p-value) overall survival prediction in patients with lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) cancers when compared to the existing, gold-standard subtypes for lung cancers (integrative clustering). Furthermore, MESBC detected several genes with significant prognostic value in both LUAD and LUSC patients. External validation on an independent, unseen GEO dataset of LUAD showed that MESBC-derived clusters based on TCGA data still exhibited clear biclustering patterns and consistent, outstanding prognostic predictability, demonstrating robust generalizability of MESBC. Therefore, MESBC could potentially be used as a risk stratification tool to optimize the treatment for the patient, improve the selection of patients for clinical trials, and contribute to the development of novel therapeutic agents.

Collapse

Ranek JS, Stallaert W, Milner JJ, Redick M, Wolff SC, Beltran AS, Stanley N, Purvis JE. DELVE: feature selection for preserving biological trajectories in single-cell data. Nat Commun 2024;15:2765. [PMID: 38553455 PMCID: PMC10980758 DOI: 10.1038/s41467-024-46773-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 03/07/2024] [Indexed: 04/02/2024] Open

Brooks TG, Lahens NF, Mrčela A, Sarantopoulou D, Nayak S, Naik A, Sengupta S, Choi PS, Grant GR. BEERS2: RNA-Seq simulation through high fidelity in silico modeling. Brief Bioinform 2024;25:bbae164. [PMID: 38605641 PMCID: PMC11009461 DOI: 10.1093/bib/bbae164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 01/26/2024] [Accepted: 03/26/2024] [Indexed: 04/13/2024] Open

Garmire LX, Li Y, Huang Q, Xu C, Teichmann SA, Kaminski N, Pellegrini M, Nguyen Q, Teschendorff AE. Challenges and perspectives in computational deconvolution of genomics data. Nat Methods 2024;21:391-400. [PMID: 38374264 DOI: 10.1038/s41592-023-02166-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 12/26/2023] [Indexed: 02/21/2024]

Song D, Wang Q, Yan G, Liu T, Sun T, Li JJ. scDesign3 generates realistic in silico data for multimodal single-cell and spatial omics. Nat Biotechnol 2024;42:247-252. [PMID: 37169966 PMCID: PMC11182337 DOI: 10.1038/s41587-023-01772-1] [Citation(s) in RCA: 34] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 03/30/2023] [Indexed: 05/13/2023]

Fu X, Lin Y, Lin DM, Mechtersheimer D, Wang C, Ameen F, Ghazanfar S, Patrick E, Kim J, Yang JYH. BIDCell: Biologically-informed self-supervised learning for segmentation of subcellular spatial transcriptomics data. Nat Commun 2024;15:509. [PMID: 38218939 PMCID: PMC10787788 DOI: 10.1038/s41467-023-44560-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 12/13/2023] [Indexed: 01/15/2024] Open

Affiliation(s)

Xiaohang Fu School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia School of Computer Science, The University of Sydney, Sydney, NSW, 2006, Australia Sydney Precision Data Science Centre, University of Sydney, Sydney, NSW, 2006, Australia Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
Yingxin Lin School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia Sydney Precision Data Science Centre, University of Sydney, Sydney, NSW, 2006, Australia Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
David M Lin Department of Biomedical Sciences, Cornell University, Ithaca, NY, 14850, USA
Daniel Mechtersheimer School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia Sydney Precision Data Science Centre, University of Sydney, Sydney, NSW, 2006, Australia Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
Chuhan Wang School of Computer Science, The University of Sydney, Sydney, NSW, 2006, Australia Sydney Precision Data Science Centre, University of Sydney, Sydney, NSW, 2006, Australia Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
Farhan Ameen School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia Sydney Precision Data Science Centre, University of Sydney, Sydney, NSW, 2006, Australia Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
Shila Ghazanfar School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia Sydney Precision Data Science Centre, University of Sydney, Sydney, NSW, 2006, Australia Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
Ellis Patrick School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia Sydney Precision Data Science Centre, University of Sydney, Sydney, NSW, 2006, Australia Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China The Westmead Institute for Medical Research, Sydney, NSW, 2145, Australia
Jinman Kim School of Computer Science, The University of Sydney, Sydney, NSW, 2006, Australia Sydney Precision Data Science Centre, University of Sydney, Sydney, NSW, 2006, Australia Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
Jean Y H Yang School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia. Sydney Precision Data Science Centre, University of Sydney, Sydney, NSW, 2006, Australia. Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia. Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China.

Collapse

Feng Y, Wang S, Liu X, Han Y, Xu H, Duan X, Xie W, Tian Z, Yuan Z, Wan Z, Xu L, Qin S, He K, Huang J. Geometric constraint-triggered collagen expression mediates bacterial-host adhesion. Nat Commun 2023;14:8165. [PMID: 38071397 PMCID: PMC10710423 DOI: 10.1038/s41467-023-43827-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 11/21/2023] [Indexed: 12/18/2023] Open

Affiliation(s)

Yuting Feng Department of Mechanics and Engineering Science, College of Engineering, Peking University, 100871, Beijing, China
Shuyi Wang Department of Mechanics and Engineering Science, College of Engineering, Peking University, 100871, Beijing, China
Xiaoye Liu Beijing Traditional Chinese Veterinary Engineering Center and Beijing Key Laboratory of Traditional Chinese Veterinary Medicine, Beijing University of Agriculture, 102206, Beijing, China
Yiming Han Department of Mechanics and Engineering Science, College of Engineering, Peking University, 100871, Beijing, China
Hongwei Xu Department of Mechanics and Engineering Science, College of Engineering, Peking University, 100871, Beijing, China
Xiaocen Duan Department of Mechanics and Engineering Science, College of Engineering, Peking University, 100871, Beijing, China
Wenyue Xie Department of Mechanics and Engineering Science, College of Engineering, Peking University, 100871, Beijing, China
Zhuoling Tian Department of Mechanics and Engineering Science, College of Engineering, Peking University, 100871, Beijing, China Academy for Advanced Interdisciplinary Studies, Peking University, 100871, Beijing, China
Zuoying Yuan Department of Mechanics and Engineering Science, College of Engineering, Peking University, 100871, Beijing, China
Zhuo Wan Department of Mechanics and Engineering Science, College of Engineering, Peking University, 100871, Beijing, China
Liang Xu Department of Mechanics and Engineering Science, College of Engineering, Peking University, 100871, Beijing, China Academy for Advanced Interdisciplinary Studies, Peking University, 100871, Beijing, China
Siying Qin School of Life Sciences, Peking University, 100871, Beijing, China
Kangmin He State Key Laboratory of Molecular Developmental Biology, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, 100101, Beijing, China University of Chinese Academy of Sciences, 100049, Beijing, China
Jianyong Huang Department of Mechanics and Engineering Science, College of Engineering, Peking University, 100871, Beijing, China.

Collapse

Yang Y, Wang K, Lu Z, Wang T, Wang X. Cytomulate: accurate and efficient simulation of CyTOF data. Genome Biol 2023;24:262. [PMID: 37974276 PMCID: PMC10652542 DOI: 10.1186/s13059-023-03099-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 10/24/2023] [Indexed: 11/19/2023] Open

Li C, Chen X, Chen S, Jiang R, Zhang X. simCAS: an embedding-based method for simulating single-cell chromatin accessibility sequencing data. Bioinformatics 2023;39:btad453. [PMID: 37494428 PMCID: PMC10394124 DOI: 10.1093/bioinformatics/btad453] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 06/25/2023] [Accepted: 07/25/2023] [Indexed: 07/28/2023] Open

Mohammad-Taheri S, Tewari V, Kapre R, Rahiminasab E, Sachs K, Tapley Hoyt C, Zucker J, Vitek O. Optimal adjustment sets for causal query estimation in partially observed biomolecular networks. Bioinformatics 2023;39:i494-i503. [PMID: 37387179 PMCID: PMC10311316 DOI: 10.1093/bioinformatics/btad270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open

Yan D, Sun Z, Fang J, Cao S, Wang W, Chang X, Badirli S, Fu H, Liu Y. scRAA: the development of a robust and automatic annotation procedure for single-cell RNA sequencing data. J Biopharm Stat 2023:1-14. [PMID: 37162278 DOI: 10.1080/10543406.2023.2208671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]

Liu C, Huang H, Yang P. Multi-task learning from multimodal single-cell omics with Matilda. Nucleic Acids Res 2023;51:e45. [PMID: 36912104 PMCID: PMC10164589 DOI: 10.1093/nar/gkad157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 01/28/2023] [Accepted: 02/21/2023] [Indexed: 03/14/2023] Open

Sun L, Wang G, Zhang Z. SimCH: simulation of single-cell RNA sequencing data by modeling cellular heterogeneity at gene expression level. Brief Bioinform 2023;24:6961608. [PMID: 36575569 DOI: 10.1093/bib/bbac590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 11/08/2022] [Accepted: 12/02/2022] [Indexed: 12/29/2022] Open

Shakola F, Palejev D, Ivanov I. A Framework for Comparison and Assessment of Synthetic RNA-Seq Data. Genes (Basel) 2022;13:2362. [PMID: 36553629 PMCID: PMC9778097 DOI: 10.3390/genes13122362] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/05/2022] [Accepted: 12/06/2022] [Indexed: 12/16/2022] Open

Robert PA, Akbar R, Frank R, Pavlović M, Widrich M, Snapkov I, Slabodkin A, Chernigovskaya M, Scheffer L, Smorodina E, Rawat P, Mehta BB, Vu MH, Mathisen IF, Prósz A, Abram K, Olar A, Miho E, Haug DTT, Lund-Johansen F, Hochreiter S, Haff IH, Klambauer G, Sandve GK, Greiff V. Unconstrained generation of synthetic antibody-antigen structures to guide machine learning methodology for antibody specificity prediction. NATURE COMPUTATIONAL SCIENCE 2022;2:845-865. [PMID: 38177393 DOI: 10.1038/s43588-022-00372-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 11/09/2022] [Indexed: 01/06/2024]

Affiliation(s)

Philippe A Robert Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway.
Rahmad Akbar Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway
Robert Frank Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway
Milena Pavlović Department of Informatics, University of Oslo, Oslo, Norway
Michael Widrich ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Linz, Austria
Igor Snapkov Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway
Andrei Slabodkin Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway
Maria Chernigovskaya Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway
Lonneke Scheffer Department of Informatics, University of Oslo, Oslo, Norway
Eva Smorodina Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway
Puneet Rawat Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway
Brij Bhushan Mehta Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway
Mai Ha Vu Department of Linguistics and Scandinavian Studies, University of Oslo, Oslo, Norway
Ingvild Frøberg Mathisen Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway
Aurél Prósz Danish Cancer Society Research Center, Translational Cancer Genomics, Copenhagen, Denmark
Krzysztof Abram The Novo Nordisk Foundation Center for Biosustainability, Autoflow, DTU Biosustain and IT University of Copenhagen, Copenhagen, Denmark
Alex Olar Department of Complex Systems in Physics, Eötvös Loránd University, Budapest, Hungary
Enkelejda Miho Institute of Medical Engineering and Medical Informatics, School of Life Sciences, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Muttenz, Switzerland aiNET GmbH, Basel, Switzerland Swiss Institute of Bioinformatics, Lausanne, Switzerland
Dag Trygve Tryslew Haug Department of Linguistics and Scandinavian Studies, University of Oslo, Oslo, Norway
Fridtjof Lund-Johansen Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway
Sepp Hochreiter ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Linz, Austria Institute of Advanced Research in Artificial Intelligence (IARAI), Vienna, Austria
Ingrid Hobæk Haff Department of Mathematics, University of Oslo, Oslo, Norway
Günter Klambauer ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Linz, Austria
Geir Kjetil Sandve Department of Informatics, University of Oslo, Oslo, Norway
Victor Greiff Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway.

Collapse

Sandve GK, Greiff V. Access to ground truth at unconstrained size makes simulated data as indispensable as experimental data for bioinformatics methods development and benchmarking. Bioinformatics 2022;38:4994-4996. [PMID: 36073940 PMCID: PMC9620827 DOI: 10.1093/bioinformatics/btac612] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 02/18/2022] [Accepted: 09/08/2022] [Indexed: 11/14/2022] Open

Azodi CB, Zappia L, Oshlack A, McCarthy DJ. splatPop: simulating population scale single-cell RNA sequencing data. Genome Biol 2021;22:341. [PMID: 34911537 DOI: 10.1186/s13059-021-02546-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2021] [Accepted: 11/19/2021] [Indexed: 11/10/2022] Open