1
|
Franchini M, Pellecchia S, Viscido G, Gambardella G. Single-cell gene set enrichment analysis and transfer learning for functional annotation of scRNA-seq data. NAR Genom Bioinform 2023; 5:lqad024. [PMID: 36879897 PMCID: PMC9985338 DOI: 10.1093/nargab/lqad024] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 01/16/2023] [Accepted: 02/20/2023] [Indexed: 03/07/2023] Open
Abstract
Although an essential step, cell functional annotation often proves particularly challenging from single-cell transcriptional data. Several methods have been developed to accomplish this task. However, in most cases, these rely on techniques initially developed for bulk RNA sequencing or simply make use of marker genes identified from cell clustering followed by supervised annotation. To overcome these limitations and automatize the process, we have developed two novel methods, the single-cell gene set enrichment analysis (scGSEA) and the single-cell mapper (scMAP). scGSEA combines latent data representations and gene set enrichment scores to detect coordinated gene activity at single-cell resolution. scMAP uses transfer learning techniques to re-purpose and contextualize new cells into a reference cell atlas. Using both simulated and real datasets, we show that scGSEA effectively recapitulates recurrent patterns of pathways' activity shared by cells from different experimental conditions. At the same time, we show that scMAP can reliably map and contextualize new single-cell profiles on a breast cancer atlas we recently released. Both tools are provided in an effective and straightforward workflow providing a framework to determine cell function and significantly improve annotation and interpretation of scRNA-seq data.
Collapse
Affiliation(s)
- Melania Franchini
- Telethon Institute of Genetics and Medicine, Pozzuoli 80078 Naples, Italy.,Department of Electrical Engineering and Information Technologies, University of Naples Federico II, 80125 Naples, Italy
| | - Simona Pellecchia
- Telethon Institute of Genetics and Medicine, Pozzuoli 80078 Naples, Italy
| | - Gaetano Viscido
- Telethon Institute of Genetics and Medicine, Pozzuoli 80078 Naples, Italy
| | - Gennaro Gambardella
- Telethon Institute of Genetics and Medicine, Pozzuoli 80078 Naples, Italy.,Department of Chemical Materials and Industrial Engineering, University of Naples Federico II, 80125 Naples, Italy
| |
Collapse
|
2
|
Tang Z, Yu Z, Wang C. A fast iterative algorithm for high-dimensional differential network. Comput Stat 2019. [DOI: 10.1007/s00180-019-00915-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
3
|
Gambardella G, di Bernardo D. A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining. Front Genet 2019; 10:734. [PMID: 31447887 PMCID: PMC6696874 DOI: 10.3389/fgene.2019.00734] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Accepted: 07/12/2019] [Indexed: 11/28/2022] Open
Abstract
Gene expression in individual cells can now be measured for thousands of cells in a single experiment thanks to innovative sample-preparation and sequencing technologies. State-of-the-art computational pipelines for single-cell RNA-sequencing data, however, still employ computational methods that were developed for traditional bulk RNA-sequencing data, thus not accounting for the peculiarities of single-cell data, such as sparseness and zero-inflated counts. Here, we present a ready-to-use pipeline named gf-icf (gene frequency–inverse cell frequency) for normalization of raw counts, feature selection, and dimensionality reduction of scRNA-seq data for their visualization and subsequent analyses. Our work is based on a data transformation model named term frequency–inverse document frequency (TF-IDF), which has been extensively used in the field of text mining where extremely sparse and zero-inflated data are common. Using benchmark scRNA-seq datasets, we show that the gf-icf pipeline outperforms existing state-of-the-art methods in terms of improved visualization and ability to separate and distinguish different cell types.
Collapse
Affiliation(s)
- Gennaro Gambardella
- University of Naples Federico II, Department of Chemical Materials and Industrial Engineering, Naples, Italy.,Telethon Institute of Genetics and Medicine, Naples, Italy
| | - Diego di Bernardo
- University of Naples Federico II, Department of Chemical Materials and Industrial Engineering, Naples, Italy.,Telethon Institute of Genetics and Medicine, Naples, Italy
| |
Collapse
|
4
|
Singh AJ, Ramsey SA, Filtz TM, Kioussi C. Differential gene regulatory networks in development and disease. Cell Mol Life Sci 2018; 75:1013-1025. [PMID: 29018868 PMCID: PMC11105524 DOI: 10.1007/s00018-017-2679-6] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Revised: 09/19/2017] [Accepted: 10/04/2017] [Indexed: 02/02/2023]
Abstract
Gene regulatory networks, in which differential expression of regulator genes induce differential expression of their target genes, underlie diverse biological processes such as embryonic development, organ formation and disease pathogenesis. An archetypical systems biology approach to mapping these networks involves the combined application of (1) high-throughput sequencing-based transcriptome profiling (RNA-seq) of biopsies under diverse network perturbations and (2) network inference based on gene-gene expression correlation analysis. The comparative analysis of such correlation networks across cell types or states, differential correlation network analysis, can identify specific molecular signatures and functional modules that underlie the state transition or have context-specific function. Here, we review the basic concepts of network biology and correlation network inference, and the prevailing methods for differential analysis of correlation networks. We discuss applications of gene expression network analysis in the context of embryonic development, cancer, and congenital diseases.
Collapse
Affiliation(s)
- Arun J Singh
- Department of Pharmaceutical Sciences, College of Pharmacy, Oregon State University, Corvallis, OR, 97331, USA
| | - Stephen A Ramsey
- Department of Biomedical Sciences, College of Veterinary Medicine, Oregon State University, Corvallis, OR, 97331, USA
- School of Electrical Engineering and Computer Science, College of Engineering, Oregon State University, Corvallis, OR, 97331, USA
| | - Theresa M Filtz
- Department of Pharmaceutical Sciences, College of Pharmacy, Oregon State University, Corvallis, OR, 97331, USA
| | - Chrissa Kioussi
- Department of Pharmaceutical Sciences, College of Pharmacy, Oregon State University, Corvallis, OR, 97331, USA.
| |
Collapse
|
5
|
Gambardella G, Carissimo A, Chen A, Cutillo L, Nowakowski TJ, di Bernardo D, Blelloch R. The impact of microRNAs on transcriptional heterogeneity and gene co-expression across single embryonic stem cells. Nat Commun 2017; 8:14126. [PMID: 28102192 PMCID: PMC5253645 DOI: 10.1038/ncomms14126] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2016] [Accepted: 12/01/2016] [Indexed: 12/21/2022] Open
Abstract
MicroRNAs act posttranscriptionally to suppress multiple target genes within a cell population. To what extent this multi-target suppression occurs in individual cells and how it impacts transcriptional heterogeneity and gene co-expression remains unknown. Here we used single-cell sequencing combined with introduction of individual microRNAs. miR-294 and let-7c were introduced into otherwise microRNA-deficient Dgcr8 knockout mouse embryonic stem cells. Both microRNAs induce suppression and correlated expression of their respective gene targets. The two microRNAs had opposing effects on transcriptional heterogeneity within the cell population, with let-7c increasing and miR-294 decreasing the heterogeneity between cells. Furthermore, let-7c promotes, whereas miR-294 suppresses, the phasing of cell cycle genes. These results show at the individual cell level how a microRNA simultaneously has impacts on its many targets and how that in turn can influence a population of cells. The findings have important implications in the understanding of how microRNAs influence the co-expression of genes and pathways, and thus ultimately cell fate. MicroRNAs can posttranscriptionally repress multiple targets in a cell population. Here the authors use single-cell sequencing to investigate the effects of an individual miRNA on transcriptional heterogeneity and gene co-expression
Collapse
Affiliation(s)
| | | | - Amy Chen
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, Center for Reproductive Sciences, University of California, San Francisco, San Francisco, California 94143, USA.,Department of Urology, University of California, San Francisco, San Francisco, California 94143, USA
| | - Luisa Cutillo
- Telethon Institute of Genetics and Medicine, Pozzuoli, 80078 Naples, Italy
| | - Tomasz J Nowakowski
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, Center for Reproductive Sciences, University of California, San Francisco, San Francisco, California 94143, USA
| | - Diego di Bernardo
- Telethon Institute of Genetics and Medicine, Pozzuoli, 80078 Naples, Italy.,Department of Chemical, Materials and Industrial Engineering, University of Naples 'Federico II', 80125 Naples, Italy
| | - Robert Blelloch
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, Center for Reproductive Sciences, University of California, San Francisco, San Francisco, California 94143, USA.,Department of Urology, University of California, San Francisco, San Francisco, California 94143, USA
| |
Collapse
|
6
|
Differential network analysis reveals the genome-wide landscape of estrogen receptor modulation in hormonal cancers. Sci Rep 2016; 6:23035. [PMID: 26972162 PMCID: PMC4789788 DOI: 10.1038/srep23035] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 02/23/2016] [Indexed: 12/14/2022] Open
Abstract
Several mutual information (MI)-based algorithms have been developed to identify dynamic gene-gene and function-function interactions governed by key modulators (genes, proteins, etc.). Due to intensive computation, however, these methods rely heavily on prior knowledge and are limited in genome-wide analysis. We present the modulated gene/gene set interaction (MAGIC) analysis to systematically identify genome-wide modulation of interaction networks. Based on a novel statistical test employing conjugate Fisher transformations of correlation coefficients, MAGIC features fast computation and adaption to variations of clinical cohorts. In simulated datasets MAGIC achieved greatly improved computation efficiency and overall superior performance than the MI-based method. We applied MAGIC to construct the estrogen receptor (ER) modulated gene and gene set (representing biological function) interaction networks in breast cancer. Several novel interaction hubs and functional interactions were discovered. ER+ dependent interaction between TGFβ and NFκB was further shown to be associated with patient survival. The findings were verified in independent datasets. Using MAGIC, we also assessed the essential roles of ER modulation in another hormonal cancer, ovarian cancer. Overall, MAGIC is a systematic framework for comprehensively identifying and constructing the modulated interaction networks in a whole-genome landscape. MATLAB implementation of MAGIC is available for academic uses at https://github.com/chiuyc/MAGIC.
Collapse
|