1
|
Kalfon J, Samaran J, Peyré G, Cantini L. scPRINT: pre-training on 50 million cells allows robust gene network predictions. Nat Commun 2025; 16:3607. [PMID: 40240364 PMCID: PMC12003772 DOI: 10.1038/s41467-025-58699-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Accepted: 03/24/2025] [Indexed: 04/18/2025] Open
Abstract
A cell is governed by the interaction of myriads of macromolecules. Inferring such a network of interactions has remained an elusive milestone in cellular biology. Building on recent advances in large foundation models and their ability to learn without supervision, we present scPRINT, a large cell model for the inference of gene networks pre-trained on more than 50 million cells from the cellxgene database. Using innovative pretraining tasks and model architecture, scPRINT pushes large transformer models towards more interpretability and usability when uncovering the complex biology of the cell. Based on our atlas-level benchmarks, scPRINT demonstrates superior performance in gene network inference to the state of the art, as well as competitive zero-shot abilities in denoising, batch effect correction, and cell label prediction. On an atlas of benign prostatic hyperplasia, scPRINT highlights the profound connections between ion exchange, senescence, and chronic inflammation.
Collapse
Affiliation(s)
- Jérémie Kalfon
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics group, F-75015, Paris, France
| | - Jules Samaran
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics group, F-75015, Paris, France
| | - Gabriel Peyré
- CNRS and DMA de l'Ecole Normale Supérieure, CNRS, Ecole Normale Supérieure, Université PSL, 75005, Paris, France
| | - Laura Cantini
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics group, F-75015, Paris, France.
| |
Collapse
|
2
|
Alum EU, Ikpozu EN, Offor CE, Igwenyi IO, Obaroh IO, Ibiam UA, Ukaidi CUA. RNA-based diagnostic innovations: A new frontier in diabetes diagnosis and management. Diab Vasc Dis Res 2025; 22:14791641251334726. [PMID: 40230050 PMCID: PMC12033450 DOI: 10.1177/14791641251334726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/16/2025] Open
Abstract
Background/Objective: Diabetes mellitus (DM) remains a major global health challenge due to its chronic nature and associated complications. Traditional diagnostic approaches, though effective, often lack the sensitivity required for early-stage detection. Recent advancements in molecular biology have identified RNA molecules, particularly non-coding RNAs such as microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and circular RNAs (circRNAs), as promising biomarkers for diabetes. This review aims to explore the role of RNA-based biomarkers in the diagnosis, prognosis, and management of diabetes, highlighting their potential to revolutionize diabetes care.Method: A comprehensive literature review was conducted using electronic databases including PubMed, Scopus, and Web of Science. Articles published up to 2024 were screened and analyzed to extract relevant findings related to RNA-based diagnostics in diabetes. Emphasis was placed on studies demonstrating clinical utility, mechanistic insights, and translational potential of RNA molecules.Results: Numerous RNA species, particularly miRNAs such as miR-375, miR-29, and lncRNAs like H19 and MEG3, exhibit altered expression patterns in diabetic patients. These molecules are involved in key regulatory pathways of glucose metabolism, insulin resistance, and β-cell function. Circulating RNAs are detectable in various biofluids, enabling non-invasive diagnostic approaches. Emerging technologies, including RNA sequencing and liquid biopsy platforms, have enhanced the sensitivity and specificity of RNA detection, fostering the development of novel diagnostic tools and personalized therapeutic strategies.Conclusion: RNA-based biomarkers hold significant promise in advancing early detection, risk stratification, and therapeutic monitoring in diabetes care. Despite current challenges such as standardization and clinical validation, the integration of RNA diagnostics into routine clinical practice could transform diabetes management, paving the way for precision medicine approaches. Further research and multi-center trials are essential to validate these biomarkers and facilitate their regulatory approval and clinical implementation.
Collapse
Affiliation(s)
- Esther Ugo Alum
- Department of Research and Publications, Kampala International University, Uganda
- Department of Biochemistry, Ebonyi State University, Abakaliki, Nigeria
| | | | | | | | - Israel Olusegun Obaroh
- Department of Biological and Environmental Sciences, School of Natural and Applied Sciences, Kampala International University, Uganda
| | - Udu Ama Ibiam
- Department of Biochemistry, Ebonyi State University, Abakaliki, Nigeria
- Department of Biochemistry, College of Science, Evangel University Akaeze, Abakaliki, Nigeria
| | - Chris U. A. Ukaidi
- College of Economics and Management, Kampala International University, Uganda
| |
Collapse
|
3
|
Leote AC, Lopes F, Beyer A. Loss of coordination between basic cellular processes in human aging. NATURE AGING 2024; 4:1432-1445. [PMID: 39227753 PMCID: PMC11485205 DOI: 10.1038/s43587-024-00696-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 07/30/2024] [Indexed: 09/05/2024]
Abstract
Age-related loss of gene expression coordination has been reported for distinct cell types and may lead to impaired cellular function. Here we propose a method for quantifying age-related changes in transcriptional regulatory relationships between genes, based on a model learned from external data. We used this method to uncover age-related trends in gene-gene relationships across eight human tissues, which demonstrates that reduced co-expression may also result from coordinated transcriptional responses. Our analyses reveal similar numbers of strengthening and weakening gene-gene relationships with age, impacting both tissue-specific (for example, coagulation in blood) and ubiquitous biological functions. Regulatory relationships becoming weaker with age were established mostly between genes operating in distinct cellular processes. As opposed to that, regulatory relationships becoming stronger with age were established both within and between different cellular functions. Our work reveals that, although most transcriptional regulatory gene-gene relationships are maintained during aging, those with declining regulatory coupling result mostly from a loss of coordination between distinct cellular processes.
Collapse
Affiliation(s)
- Ana Carolina Leote
- Cologne Excellence Cluster on Cellular Stress Responses in Age-Associated Diseases (CECAD), University of Cologne, Cologne, Germany
| | - Francisco Lopes
- Cologne Excellence Cluster on Cellular Stress Responses in Age-Associated Diseases (CECAD), University of Cologne, Cologne, Germany
- Department II of Internal Medicine, University of Cologne, Faculty of Medicine and University Hospital Cologne, Cologne, Germany
| | - Andreas Beyer
- Cologne Excellence Cluster on Cellular Stress Responses in Age-Associated Diseases (CECAD), University of Cologne, Cologne, Germany.
- Institute for Genetics, Faculty of Mathematics and Natural Sciences, University of Cologne, Cologne, Germany.
- Center for Molecular Medicine Cologne (CMMC), University of Cologne, Faculty of Medicine and University Hospital Cologne, Cologne, Germany.
| |
Collapse
|
4
|
Unger Avila P, Padvitski T, Leote AC, Chen H, Saez-Rodriguez J, Kann M, Beyer A. Gene regulatory networks in disease and ageing. Nat Rev Nephrol 2024; 20:616-633. [PMID: 38867109 DOI: 10.1038/s41581-024-00849-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/15/2024] [Indexed: 06/14/2024]
Abstract
The precise control of gene expression is required for the maintenance of cellular homeostasis and proper cellular function, and the declining control of gene expression with age is considered a major contributor to age-associated changes in cellular physiology and disease. The coordination of gene expression can be represented through models of the molecular interactions that govern gene expression levels, so-called gene regulatory networks. Gene regulatory networks can represent interactions that occur through signal transduction, those that involve regulatory transcription factors, or statistical models of gene-gene relationships based on the premise that certain sets of genes tend to be coexpressed across a range of conditions and cell types. Advances in experimental and computational technologies have enabled the inference of these networks on an unprecedented scale and at unprecedented precision. Here, we delineate different types of gene regulatory networks and their cell-biological interpretation. We describe methods for inferring such networks from large-scale, multi-omics datasets and present applications that have aided our understanding of cellular ageing and disease mechanisms.
Collapse
Affiliation(s)
- Paula Unger Avila
- Cluster of Excellence on Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany
| | - Tsimafei Padvitski
- Cluster of Excellence on Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany
| | - Ana Carolina Leote
- Cluster of Excellence on Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany
| | - He Chen
- Cluster of Excellence on Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany
- Department II of Internal Medicine, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Julio Saez-Rodriguez
- Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg University, Heidelberg, Germany
| | - Martin Kann
- Department II of Internal Medicine, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
- Center for Molecular Medicine Cologne, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Andreas Beyer
- Cluster of Excellence on Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany.
- Center for Molecular Medicine Cologne, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany.
- Institute for Genetics, Faculty of Mathematics and Natural Sciences, University of Cologne, Cologne, Germany.
| |
Collapse
|
5
|
Park Y, Hauschild AC. The effect of data transformation on low-dimensional integration of single-cell RNA-seq. BMC Bioinformatics 2024; 25:171. [PMID: 38689234 PMCID: PMC11059821 DOI: 10.1186/s12859-024-05788-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 04/16/2024] [Indexed: 05/02/2024] Open
Abstract
BACKGROUND Recent developments in single-cell RNA sequencing have opened up a multitude of possibilities to study tissues at the level of cellular populations. However, the heterogeneity in single-cell sequencing data necessitates appropriate procedures to adjust for technological limitations and various sources of noise when integrating datasets from different studies. While many analysis procedures employ various preprocessing steps, they often overlook the importance of selecting and optimizing the employed data transformation methods. RESULTS This work investigates data transformation approaches used in single-cell clustering analysis tools and their effects on batch integration analysis. In particular, we compare 16 transformations and their impact on the low-dimensional representations, aiming to reduce the batch effect and integrate multiple single-cell sequencing data. Our results show that data transformations strongly influence the results of single-cell clustering on low-dimensional data space, such as those generated by UMAP or PCA. Moreover, these changes in low-dimensional space significantly affect trajectory analysis using multiple datasets, as well. However, the performance of the data transformations greatly varies across datasets, and the optimal method was different for each dataset. Additionally, we explored how data transformation impacts the analysis of deep feature encodings using deep neural network-based models, including autoencoder-based models and proto-typical networks. Data transformation also strongly affects the outcome of deep neural network models. CONCLUSIONS Our findings suggest that the batch effect and noise in integrative analysis are highly influenced by data transformation. Low-dimensional features can integrate different batches well when proper data transformation is applied. Furthermore, we found that the batch mixing score on low-dimensional space can guide the selection of the optimal data transformation. In conclusion, data preprocessing is one of the most crucial analysis steps and needs to be cautiously considered in the integrative analysis of multiple scRNA-seq datasets.
Collapse
Affiliation(s)
- Youngjun Park
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
- International Max Planck Research Schools for Genome Science, Georg-August-Universität Göttingen, Göttingen, Germany
| | - Anne-Christin Hauschild
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany.
- Campus-Institute Data Science (CIDAS), Georg-August-Universität Göttingen, Göttingen, Germany.
| |
Collapse
|
6
|
Grones C, Eekhout T, Shi D, Neumann M, Berg LS, Ke Y, Shahan R, Cox KL, Gomez-Cano F, Nelissen H, Lohmann JU, Giacomello S, Martin OC, Cole B, Wang JW, Kaufmann K, Raissig MT, Palfalvi G, Greb T, Libault M, De Rybel B. Best practices for the execution, analysis, and data storage of plant single-cell/nucleus transcriptomics. THE PLANT CELL 2024; 36:812-828. [PMID: 38231860 PMCID: PMC10980355 DOI: 10.1093/plcell/koae003] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 10/17/2023] [Accepted: 10/24/2023] [Indexed: 01/19/2024]
Abstract
Single-cell and single-nucleus RNA-sequencing technologies capture the expression of plant genes at an unprecedented resolution. Therefore, these technologies are gaining traction in plant molecular and developmental biology for elucidating the transcriptional changes across cell types in a specific tissue or organ, upon treatments, in response to biotic and abiotic stresses, or between genotypes. Despite the rapidly accelerating use of these technologies, collective and standardized experimental and analytical procedures to support the acquisition of high-quality data sets are still missing. In this commentary, we discuss common challenges associated with the use of single-cell transcriptomics in plants and propose general guidelines to improve reproducibility, quality, comparability, and interpretation and to make the data readily available to the community in this fast-developing field of research.
Collapse
Affiliation(s)
- Carolin Grones
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB Centre for Plant Systems Biology, Ghent 9052, Belgium
| | - Thomas Eekhout
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB Centre for Plant Systems Biology, Ghent 9052, Belgium
- VIB Single Cell Core Facility, Ghent 9052, Belgium
| | - Dongbo Shi
- Centre for Organismal Studies, Heidelberg University, 69120 Heidelberg, Germany
- Institute of Biochemistry and Biology, University of Potsdam, 14476 Potsdam, Germany
| | - Manuel Neumann
- Institute of Biology, Humboldt-Universität zu Berlin, 10115 Berlin, Germany
| | - Lea S Berg
- Institute of Plant Sciences, University of Bern, 3012 Bern, Switzerland
| | - Yuji Ke
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB Centre for Plant Systems Biology, Ghent 9052, Belgium
| | - Rachel Shahan
- Department of Biology, Duke University, Durham, NC 27708, USA
- Howard Hughes Medical Institute, Duke University, Durham, NC 27708, USA
| | - Kevin L Cox
- Donald Danforth Plant Science Center, St. Louis, MO 63132, USA
| | - Fabio Gomez-Cano
- Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Hilde Nelissen
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB Centre for Plant Systems Biology, Ghent 9052, Belgium
| | - Jan U Lohmann
- Centre for Organismal Studies, Heidelberg University, 69120 Heidelberg, Germany
| | - Stefania Giacomello
- SciLifeLab, Department of Gene Technology, KTH Royal Institute of Technology, 17165 Solna, Sweden
| | - Olivier C Martin
- Universities of Paris-Saclay, Paris-Cité and Evry, CNRS, INRAE, Institute of Plant Sciences Paris-Saclay, Gif-sur-Yvette 91192, France
| | - Benjamin Cole
- DOE-Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Jia-Wei Wang
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences (CEMPS), Institute of Plant Physiology and Ecology (SIPPE), Chinese Academy of Sciences (CAS), Shanghai 200032, China
| | - Kerstin Kaufmann
- Institute of Biology, Humboldt-Universität zu Berlin, 10115 Berlin, Germany
| | - Michael T Raissig
- Institute of Plant Sciences, University of Bern, 3012 Bern, Switzerland
| | - Gergo Palfalvi
- Department of Comparative Development and Genetics, Max Planck Institute for Plant Breeding Research, 50829 Cologne, Germany
| | - Thomas Greb
- Centre for Organismal Studies, Heidelberg University, 69120 Heidelberg, Germany
| | - Marc Libault
- Division of Plant Science and Technology, Interdisciplinary Plant Group, College of Agriculture, Food, and Natural Resources, University of Missouri-Columbia, Columbia, MO 65201, USA
| | - Bert De Rybel
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB Centre for Plant Systems Biology, Ghent 9052, Belgium
| |
Collapse
|
7
|
Song T, Broadbent C, Kuang R. GNTD: reconstructing spatial transcriptomes with graph-guided neural tensor decomposition informed by spatial and functional relations. Nat Commun 2023; 14:8276. [PMID: 38092776 PMCID: PMC10719260 DOI: 10.1038/s41467-023-44017-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Accepted: 11/15/2023] [Indexed: 12/17/2023] Open
Abstract
Spatially-resolved RNA profiling has now been widely used to understand cells' structural organizations and functional roles in tissues, yet it is challenging to reconstruct the whole spatial transcriptomes due to various inherent technical limitations in tissue section preparation and RNA capture and fixation in the application of the spatial RNA profiling technologies. Here, we introduce a graph-guided neural tensor decomposition (GNTD) model for reconstructing whole spatial transcriptomes in tissues. GNTD employs a hierarchical tensor structure and formulation to explicitly model the high-order spatial gene expression data with a hierarchical nonlinear decomposition in a three-layer neural network, enhanced by spatial relations among the capture spots and gene functional relations for accurate reconstruction from highly sparse spatial profiling data. Extensive experiments on 22 Visium spatial transcriptomics datasets and 3 high-resolution Stereo-seq datasets as well as simulation data demonstrate that GNTD consistently improves the imputation accuracy in cross-validations driven by nonlinear tensor decomposition and incorporation of spatial and functional information, and confirm that the imputed spatial transcriptomes provide a more complete gene expression landscape for downstream analyses of cell/spot clustering for tissue segmentation, and spatial gene expression clustering and visualizations.
Collapse
Affiliation(s)
- Tianci Song
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, 55414, MN, USA
| | - Charles Broadbent
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, 55414, MN, USA
| | - Rui Kuang
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, 55414, MN, USA.
| |
Collapse
|
8
|
Abstract
Missing values are a notable challenge when analyzing mass spectrometry-based proteomics data. While the field is still actively debating the best practices, the challenge increased with the emergence of mass spectrometry-based single-cell proteomics and the dramatic increase in missing values. A popular approach to deal with missing values is to perform imputation. Imputation has several drawbacks for which alternatives exist, but currently, imputation is still a practical solution widely adopted in single-cell proteomics data analysis. This perspective discusses the advantages and drawbacks of imputation. We also highlight 5 main challenges linked to missing value management in single-cell proteomics. Future developments should aim to solve these challenges, whether it is through imputation or data modeling. The perspective concludes with recommendations for reporting missing values, for reporting methods that deal with missing values, and for proper encoding of missing values.
Collapse
Affiliation(s)
- Christophe Vanderaa
- Computational Biology and Bioinformatics Unit (CBIO), de Duve Institute, UCLouvain, 1200 Brussels, Belgium
| | - Laurent Gatto
- Computational Biology and Bioinformatics Unit (CBIO), de Duve Institute, UCLouvain, 1200 Brussels, Belgium
| |
Collapse
|
9
|
姜 超, 胡 龙, 徐 春, 葛 芹, 赵 祥. [Imputation method for dropout in single-cell transcriptome data]. SHENG WU YI XUE GONG CHENG XUE ZA ZHI = JOURNAL OF BIOMEDICAL ENGINEERING = SHENGWU YIXUE GONGCHENGXUE ZAZHI 2023; 40:778-783. [PMID: 37666769 PMCID: PMC10477391 DOI: 10.7507/1001-5515.202301009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 07/27/2023] [Indexed: 09/06/2023]
Abstract
Single-cell transcriptome sequencing (scRNA-seq) can resolve the expression characteristics of cells in tissues with single-cell precision, enabling researchers to quantify cellular heterogeneity within populations with higher resolution, revealing potentially heterogeneous cell populations and the dynamics of complex tissues. However, the presence of a large number of technical zeros in scRNA-seq data will have an impact on downstream analysis of cell clustering, differential genes, cell annotation, and pseudotime, hindering the discovery of meaningful biological signals. The main idea to solve this problem is to make use of the potential correlation between cells and genes, and to impute the technical zeros through the observed data. Based on this, this paper reviewed the basic methods of imputing technical zeros in the scRNA-seq data and discussed the advantages and disadvantages of the existing methods. Finally, recommendations and perspectives on the use and development of the method were provided.
Collapse
Affiliation(s)
- 超 姜
- 东南大学 生物科学与医学工程学院 生物电子学国家重点实验室(南京 210096)State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing 210096, P. R. China
- 新格元生物科技有限公司(南京 210018)Singleron BiotechCo., Ltd, Nanjing 210018, P. R. China
| | - 龙飞 胡
- 东南大学 生物科学与医学工程学院 生物电子学国家重点实验室(南京 210096)State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing 210096, P. R. China
| | - 春祥 徐
- 东南大学 生物科学与医学工程学院 生物电子学国家重点实验室(南京 210096)State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing 210096, P. R. China
| | - 芹玉 葛
- 东南大学 生物科学与医学工程学院 生物电子学国家重点实验室(南京 210096)State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing 210096, P. R. China
| | - 祥伟 赵
- 东南大学 生物科学与医学工程学院 生物电子学国家重点实验室(南京 210096)State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing 210096, P. R. China
| |
Collapse
|
10
|
Pandey D, Onkara PP. Improved downstream functional analysis of single-cell RNA-sequence data using DGAN. Sci Rep 2023; 13:1618. [PMID: 36709340 PMCID: PMC9884242 DOI: 10.1038/s41598-023-28952-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 01/27/2023] [Indexed: 01/29/2023] Open
Abstract
The dramatic increase in the number of single-cell RNA-sequence (scRNA-seq) investigations is indeed an endorsement of the new-fangled proficiencies of next generation sequencing technologies that facilitate the accurate measurement of tens of thousands of RNA expression levels at the cellular resolution. Nevertheless, missing values of RNA amplification persist and remain as a significant computational challenge, as these data omission induce further noise in their respective cellular data and ultimately impede downstream functional analysis of scRNA-seq data. Consequently, it turns imperative to develop robust and efficient scRNA-seq data imputation methods for improved downstream functional analysis outcomes. To overcome this adversity, we have designed an imputation framework namely deep generative autoencoder network [DGAN]. In essence, DGAN is an evolved variational autoencoder designed to robustly impute data dropouts in scRNA-seq data manifested as a sparse gene expression matrix. DGAN principally reckons count distribution, besides data sparsity utilizing a gaussian model whereby, cell dependencies are capitalized to detect and exclude outlier cells via imputation. When tested on five publicly available scRNA-seq data, DGAN outperformed every single baseline method paralleled, with respect to downstream functional analysis including cell data visualization, clustering, classification and differential expression analysis. DGAN is executed in Python and is accessible at https://github.com/dikshap11/DGAN .
Collapse
Affiliation(s)
- Diksha Pandey
- Department of Biotechnology, National Institute of Technology, Warangal, India
| | - Perumal P Onkara
- Department of Biotechnology, National Institute of Technology, Warangal, India.
| |
Collapse
|