1
|
Zhu H, Slonim D. From Noise to Knowledge: Diffusion Probabilistic Model-Based Neural Inference of Gene Regulatory Networks. J Comput Biol 2024; 31:1087-1103. [PMID: 39387266 PMCID: PMC11698671 DOI: 10.1089/cmb.2024.0607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2024] Open
Abstract
Understanding gene regulatory networks (GRNs) is crucial for elucidating cellular mechanisms and advancing therapeutic interventions. Original methods for GRN inference from bulk expression data often struggled with the high dimensionality and inherent noise in the data. Here we introduce RegDiffusion, a new class of Denoising Diffusion Probabilistic Models focusing on the regulatory effects among feature variables. RegDiffusion introduces Gaussian noise to the input data following a diffusion schedule and uses a neural network with a parameterized adjacency matrix to predict the added noise. We show that using this process, GRNs can be learned effectively with a surprisingly simple model architecture. In our benchmark experiments, RegDiffusion shows superior performance compared to several baseline methods in multiple datasets. We also demonstrate that RegDiffusion can infer biologically meaningful regulatory networks from real-world single-cell data sets with over 15,000 genes in under 5 minutes. This work not only introduces a fresh perspective on GRN inference but also highlights the promising capacity of diffusion-based models in the area of single-cell analysis. The RegDiffusion software package and experiment data are available at https://github.com/TuftsBCB/RegDiffusion.
Collapse
Affiliation(s)
- Hao Zhu
- Department of Computer Science, Tufts University, Medford, Massachusetts, USA
| | - Donna Slonim
- Department of Computer Science, Tufts University, Medford, Massachusetts, USA
| |
Collapse
|
2
|
Zhang W, Huckaby B, Talburt J, Weissman S, Yang MQ. cnnImpute: missing value recovery for single cell RNA sequencing data. Sci Rep 2024; 14:3946. [PMID: 38365936 PMCID: PMC10873334 DOI: 10.1038/s41598-024-53998-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 02/07/2024] [Indexed: 02/18/2024] Open
Abstract
The advent of single-cell RNA sequencing (scRNA-seq) technology has revolutionized our ability to explore cellular diversity and unravel the complexities of intricate diseases. However, due to the inherently low signal-to-noise ratio and the presence of an excessive number of missing values, scRNA-seq data analysis encounters unique challenges. Here, we present cnnImpute, a novel convolutional neural network (CNN) based method designed to address the issue of missing data in scRNA-seq. Our approach starts by estimating missing probabilities, followed by constructing a CNN-based model to recover expression values with a high likelihood of being missing. Through comprehensive evaluations, cnnImpute demonstrates its effectiveness in accurately imputing missing values while preserving the integrity of cell clusters in scRNA-seq data analysis. It achieved superior performance in various benchmarking experiments. cnnImpute offers an accurate and scalable method for recovering missing values, providing a useful resource for scRNA-seq data analysis.
Collapse
Affiliation(s)
- Wenjuan Zhang
- MidSouth Bioinformatics Center and Joint Bioinformatics Graduate Program, University of Arkansas at Little Rock, University of Arkansas for Medical Sciences, Little Rock, 72204, AR, USA
- Department of Information Science, University of Arkansas at Little Rock, Little Rock, 72204, AR, USA
| | - Brandon Huckaby
- Department of Computer Science, University of Arkansas at Little Rock, Little Rock, 72204, AR, USA
| | - John Talburt
- Department of Information Science, University of Arkansas at Little Rock, Little Rock, 72204, AR, USA
| | - Sherman Weissman
- Department of Genetics, Yale School of Medicine, New Haven, 06520, CT, USA
| | - Mary Qu Yang
- MidSouth Bioinformatics Center and Joint Bioinformatics Graduate Program, University of Arkansas at Little Rock, University of Arkansas for Medical Sciences, Little Rock, 72204, AR, USA.
- Department of Information Science, University of Arkansas at Little Rock, Little Rock, 72204, AR, USA.
| |
Collapse
|
3
|
Vanheer L, Fantuzzi F, To SK, Schiavo A, Van Haele M, Ostyn T, Haesen T, Yi X, Janiszewski A, Chappell J, Rihoux A, Sawatani T, Roskams T, Pattou F, Kerr-Conte J, Cnop M, Pasque V. Inferring regulators of cell identity in the human adult pancreas. NAR Genom Bioinform 2023; 5:lqad068. [PMID: 37435358 PMCID: PMC10331937 DOI: 10.1093/nargab/lqad068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 06/17/2023] [Accepted: 06/28/2023] [Indexed: 07/13/2023] Open
Abstract
Cellular identity during development is under the control of transcription factors that form gene regulatory networks. However, the transcription factors and gene regulatory networks underlying cellular identity in the human adult pancreas remain largely unexplored. Here, we integrate multiple single-cell RNA-sequencing datasets of the human adult pancreas, totaling 7393 cells, and comprehensively reconstruct gene regulatory networks. We show that a network of 142 transcription factors forms distinct regulatory modules that characterize pancreatic cell types. We present evidence that our approach identifies regulators of cell identity and cell states in the human adult pancreas. We predict that HEYL, BHLHE41 and JUND are active in acinar, beta and alpha cells, respectively, and show that these proteins are present in the human adult pancreas as well as in human induced pluripotent stem cell (hiPSC)-derived islet cells. Using single-cell transcriptomics, we found that JUND represses beta cell genes in hiPSC-alpha cells. BHLHE41 depletion induced apoptosis in primary pancreatic islets. The comprehensive gene regulatory network atlas can be explored interactively online. We anticipate our analysis to be the starting point for a more sophisticated dissection of how transcription factors regulate cell identity and cell states in the human adult pancreas.
Collapse
Affiliation(s)
- Lotte Vanheer
- Department of Development and Regeneration; KU Leuven - University of Leuven; Single-cell Omics Institute and Leuven Stem Cell Institute, Herestraat 49, B-3000 Leuven, Belgium
| | - Federica Fantuzzi
- ULB Center for Diabetes Research; Université Libre de Bruxelles; Route de Lennik 808, B-1070 Brussels, Belgium
| | - San Kit To
- Department of Development and Regeneration; KU Leuven - University of Leuven; Single-cell Omics Institute and Leuven Stem Cell Institute, Herestraat 49, B-3000 Leuven, Belgium
| | - Andrea Schiavo
- ULB Center for Diabetes Research; Université Libre de Bruxelles; Route de Lennik 808, B-1070 Brussels, Belgium
| | - Matthias Van Haele
- Department of Imaging and Pathology; Translational Cell and Tissue Research, KU Leuven and University Hospitals Leuven; Herestraat 49, B-3000 Leuven, Belgium
| | - Tessa Ostyn
- Department of Imaging and Pathology; Translational Cell and Tissue Research, KU Leuven and University Hospitals Leuven; Herestraat 49, B-3000 Leuven, Belgium
| | - Tine Haesen
- Department of Development and Regeneration; KU Leuven - University of Leuven; Single-cell Omics Institute and Leuven Stem Cell Institute, Herestraat 49, B-3000 Leuven, Belgium
| | - Xiaoyan Yi
- ULB Center for Diabetes Research; Université Libre de Bruxelles; Route de Lennik 808, B-1070 Brussels, Belgium
| | - Adrian Janiszewski
- Department of Development and Regeneration; KU Leuven - University of Leuven; Single-cell Omics Institute and Leuven Stem Cell Institute, Herestraat 49, B-3000 Leuven, Belgium
| | - Joel Chappell
- Department of Development and Regeneration; KU Leuven - University of Leuven; Single-cell Omics Institute and Leuven Stem Cell Institute, Herestraat 49, B-3000 Leuven, Belgium
| | - Adrien Rihoux
- Department of Development and Regeneration; KU Leuven - University of Leuven; Single-cell Omics Institute and Leuven Stem Cell Institute, Herestraat 49, B-3000 Leuven, Belgium
| | - Toshiaki Sawatani
- ULB Center for Diabetes Research; Université Libre de Bruxelles; Route de Lennik 808, B-1070 Brussels, Belgium
| | - Tania Roskams
- Department of Imaging and Pathology; Translational Cell and Tissue Research, KU Leuven and University Hospitals Leuven; Herestraat 49, B-3000 Leuven, Belgium
| | - Francois Pattou
- University of Lille, Inserm, CHU Lille, Institute Pasteur Lille, U1190-EGID, F-59000 Lille, France
- European Genomic Institute for Diabetes, F-59000 Lille, France
- University of Lille, F-59000 Lille, France
| | - Julie Kerr-Conte
- University of Lille, Inserm, CHU Lille, Institute Pasteur Lille, U1190-EGID, F-59000 Lille, France
- European Genomic Institute for Diabetes, F-59000 Lille, France
- University of Lille, F-59000 Lille, France
| | - Miriam Cnop
- ULB Center for Diabetes Research; Université Libre de Bruxelles; Route de Lennik 808, B-1070 Brussels, Belgium
- Division of Endocrinology; Erasmus Hospital, Université Libre de Bruxelles; Route de Lennik 808, B-1070 Brussels, Belgium
| | - Vincent Pasque
- Department of Development and Regeneration; KU Leuven - University of Leuven; Single-cell Omics Institute and Leuven Stem Cell Institute, Herestraat 49, B-3000 Leuven, Belgium
| |
Collapse
|
4
|
Identification of Human Global, Tissue and Within-Tissue Cell-Specific Stably Expressed Genes at Single-Cell Resolution. Int J Mol Sci 2022; 23:ijms231810214. [PMID: 36142130 PMCID: PMC9499411 DOI: 10.3390/ijms231810214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 08/12/2022] [Accepted: 08/30/2022] [Indexed: 11/17/2022] Open
Abstract
Stably Expressed Genes (SEGs) are a set of genes with invariant expression. Identification of SEGs, especially among both healthy and diseased tissues, is of clinical relevance to enable more accurate data integration, gene expression comparison and biomarker detection. However, it remains unclear how many global SEGs there are, whether there are development-, tissue- or cell-specific SEGs, and whether diseases can influence their expression. In this research, we systematically investigate human SEGs at single-cell level and observe their development-, tissue- and cell-specificity, and expression stability under various diseased states. A hierarchical strategy is proposed to identify a list of 408 spatial-temporal SEGs. Development-specific SEGs are also identified, with adult tissue-specific SEGs enriched with the function of immune processes and fetal tissue-specific SEGs enriched in RNA splicing activities. Cells of the same type within different tissues tend to show similar SEG composition profiles. Diseases or stresses do not show influence on the expression stableness of SEGs in various tissues. In addition to serving as markers and internal references for data normalization and integration, we examine another possible application of SEGs, i.e., being applied for cell decomposition. The deconvolution model could accurately predict the fractions of major immune cells in multiple independent testing datasets of peripheral blood samples. The study provides a reliable list of human SEGs at the single-cell level, facilitates the understanding on the property of SEGs, and extends their possible applications.
Collapse
|
5
|
Chowdhury HA, Bhattacharyya DK, Kalita JK. UICPC: Centrality-based clustering for scRNA-seq data analysis without user input. Comput Biol Med 2021; 137:104820. [PMID: 34508973 DOI: 10.1016/j.compbiomed.2021.104820] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 08/24/2021] [Accepted: 08/27/2021] [Indexed: 11/16/2022]
Abstract
scRNA-seq data analysis enables new possibilities for identification of novel cells, specific characterization of known cells and study of cell heterogeneity. The performance of most clustering methods especially developed for scRNA-seq is greatly influenced by user input. We propose a centrality-clustering method named UICPC and compare its performance with 9 state-of-the-art clustering methods on 11 real-world scRNA-seq datasets to demonstrate its effectiveness and usefulness in discovering cell groups. Our method does not require user input. However, it requires settings of threshold, which are benchmarked after performing extensive experiments. We observe that most compared approaches show poor performance due to high heterogeneity and large dataset dimensions. However, UICPC shows excellent performance in terms of NMI, Purity and ARI, respectively. UICPC is available as an R package and can be downloaded by clicking the link https://sites.google.com/view/hussinchowdhury/software.
Collapse
Affiliation(s)
| | | | - Jugal Kumar Kalita
- Computer Science, College of Engineering and Applied Science, University of Colorado, Colorado Springs, CO, 80933-7150, USA.
| |
Collapse
|
6
|
Lin Y, Ghazanfar S, Strbenac D, Wang A, Patrick E, Lin DM, Speed T, Yang JYH, Yang P. Evaluating stably expressed genes in single cells. Gigascience 2019; 8:giz106. [PMID: 31531674 PMCID: PMC6748759 DOI: 10.1093/gigascience/giz106] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Revised: 05/22/2019] [Accepted: 08/09/2019] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Single-cell RNA-seq (scRNA-seq) profiling has revealed remarkable variation in transcription, suggesting that expression of many genes at the single-cell level is intrinsically stochastic and noisy. Yet, on the cell population level, a subset of genes traditionally referred to as housekeeping genes (HKGs) are found to be stably expressed in different cell and tissue types. It is therefore critical to question whether stably expressed genes (SEGs) can be identified on the single-cell level, and if so, how can their expression stability be assessed? We have previously proposed a computational framework for ranking expression stability of genes in single cells for scRNA-seq data normalization and integration. In this study, we perform detailed evaluation and characterization of SEGs derived from this framework. RESULTS Here, we show that gene expression stability indices derived from the early human and mouse development scRNA-seq datasets and the "Mouse Atlas" dataset are reproducible and conserved across species. We demonstrate that SEGs identified from single cells based on their stability indices are considerably more stable than HKGs defined previously from cell populations across diverse biological systems. Our analyses indicate that SEGs are inherently more stable at the single-cell level and their characteristics reminiscent of HKGs, suggesting their potential role in sustaining essential functions in individual cells. CONCLUSIONS SEGs identified in this study have immediate utility both for understanding variation and stability of single-cell transcriptomes and for practical applications such as scRNA-seq data normalization. Our framework for calculating gene stability index, "scSEGIndex," is incorporated into the scMerge Bioconductor R package (https://sydneybiox.github.io/scMerge/reference/scSEGIndex.html) and can be used for identifying genes with stable expression in scRNA-seq datasets.
Collapse
Affiliation(s)
- Yingxin Lin
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia
| | - Shila Ghazanfar
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
| | - Dario Strbenac
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia
| | - Andy Wang
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia
- Sydney Medical School, University of Sydney, Sydney, NSW 2006, Australia
| | - Ellis Patrick
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia
- Westmead Institute for Medical Research, University of Sydney, Westmead, NSW 2145, Australia
| | - David M Lin
- Department of Biomedical Sciences, Cornell University, Ithaca, NY 14853, USA
| | - Terence Speed
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, Australia
- Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC 3010, Australia
| | - Jean Y H Yang
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia
| | - Pengyi Yang
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia
- Computational Systems Biology Group, Children’s Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia
| |
Collapse
|
7
|
Ancient animal genome architecture reflects cell type identities. Nat Ecol Evol 2019; 3:1289-1293. [PMID: 31383947 DOI: 10.1038/s41559-019-0946-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Accepted: 06/13/2019] [Indexed: 01/04/2023]
Abstract
The level of conservation of ancient metazoan gene order (synteny) is remarkable. Despite this, the functionality of the vast majority of such regions in metazoan genomes remains elusive. Utilizing recently published single-cell expression data from several anciently diverging metazoan species, we reveal the level of correspondence between cell types and genomic synteny, identifying genomic regions conferring ancient cell type identity.
Collapse
|
8
|
Lin Y, Ghazanfar S, Wang KYX, Gagnon-Bartsch JA, Lo KK, Su X, Han ZG, Ormerod JT, Speed TP, Yang P, Yang JYH. scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets. Proc Natl Acad Sci U S A 2019; 116:9775-9784. [PMID: 31028141 PMCID: PMC6525515 DOI: 10.1073/pnas.1820006116] [Citation(s) in RCA: 111] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Concerted examination of multiple collections of single-cell RNA sequencing (RNA-seq) data promises further biological insights that cannot be uncovered with individual datasets. Here we present scMerge, an algorithm that integrates multiple single-cell RNA-seq datasets using factor analysis of stably expressed genes and pseudoreplicates across datasets. Using a large collection of public datasets, we benchmark scMerge against published methods and demonstrate that it consistently provides improved cell type separation by removing unwanted factors; scMerge can also enhance biological discovery through robust data integration, which we show through the inference of development trajectory in a liver dataset collection.
Collapse
Affiliation(s)
- Yingxin Lin
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia
| | - Shila Ghazanfar
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia
- Charles Perkins Centre, University of Sydney, Sydney, NSW 2006, Australia
| | - Kevin Y X Wang
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia
| | | | - Kitty K Lo
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia
| | - Xianbin Su
- Key Laboratory of Systems Biomedicine, Ministry of Education, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
- Collaborative Innovation Center of Systems Biomedicine, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Ze-Guang Han
- Key Laboratory of Systems Biomedicine, Ministry of Education, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
- Collaborative Innovation Center of Systems Biomedicine, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - John T Ormerod
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia
| | - Terence P Speed
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
- Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC 3010, Australia
| | - Pengyi Yang
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia;
- Charles Perkins Centre, University of Sydney, Sydney, NSW 2006, Australia
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia;
- Charles Perkins Centre, University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
9
|
Sonawane AR, Weiss ST, Glass K, Sharma A. Network Medicine in the Age of Biomedical Big Data. Front Genet 2019; 10:294. [PMID: 31031797 PMCID: PMC6470635 DOI: 10.3389/fgene.2019.00294] [Citation(s) in RCA: 128] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2018] [Accepted: 03/19/2019] [Indexed: 12/13/2022] Open
Abstract
Network medicine is an emerging area of research dealing with molecular and genetic interactions, network biomarkers of disease, and therapeutic target discovery. Large-scale biomedical data generation offers a unique opportunity to assess the effect and impact of cellular heterogeneity and environmental perturbations on the observed phenotype. Marrying the two, network medicine with biomedical data provides a framework to build meaningful models and extract impactful results at a network level. In this review, we survey existing network types and biomedical data sources. More importantly, we delve into ways in which the network medicine approach, aided by phenotype-specific biomedical data, can be gainfully applied. We provide three paradigms, mainly dealing with three major biological network archetypes: protein-protein interaction, expression-based, and gene regulatory networks. For each of these paradigms, we discuss a broad overview of philosophies under which various network methods work. We also provide a few examples in each paradigm as a test case of its successful application. Finally, we delineate several opportunities and challenges in the field of network medicine. We hope this review provides a lexicon for researchers from biological sciences and network theory to come on the same page to work on research areas that require interdisciplinary expertise. Taken together, the understanding gained from combining biomedical data with networks can be useful for characterizing disease etiologies and identifying therapeutic targets, which, in turn, will lead to better preventive medicine with translational impact on personalized healthcare.
Collapse
Affiliation(s)
- Abhijeet R. Sonawane
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Department of Medicine, Harvard Medical School, Boston, MA, United States
| | - Scott T. Weiss
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Department of Medicine, Harvard Medical School, Boston, MA, United States
| | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Department of Medicine, Harvard Medical School, Boston, MA, United States
| | - Amitabh Sharma
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Department of Medicine, Harvard Medical School, Boston, MA, United States
- Center for Interdisciplinary Cardiovascular Sciences, Cardiovascular Division, Brigham and Women’s Hospital, Boston, MA, United States
| |
Collapse
|
10
|
van der Wijst MGP, de Vries DH, Brugge H, Westra HJ, Franke L. An integrative approach for building personalized gene regulatory networks for precision medicine. Genome Med 2018; 10:96. [PMID: 30567569 PMCID: PMC6299585 DOI: 10.1186/s13073-018-0608-4] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Only a small fraction of patients respond to the drug prescribed to treat their disease, which means that most are at risk of unnecessary exposure to side effects through ineffective drugs. This inter-individual variation in drug response is driven by differences in gene interactions caused by each patient's genetic background, environmental exposures, and the proportions of specific cell types involved in disease. These gene interactions can now be captured by building gene regulatory networks, by taking advantage of RNA velocity (the time derivative of the gene expression state), the ability to study hundreds of thousands of cells simultaneously, and the falling price of single-cell sequencing. Here, we propose an integrative approach that leverages these recent advances in single-cell data with the sensitivity of bulk data to enable the reconstruction of personalized, cell-type- and context-specific gene regulatory networks. We expect this approach will allow the prioritization of key driver genes for specific diseases and will provide knowledge that opens new avenues towards improved personalized healthcare.
Collapse
Affiliation(s)
- Monique G P van der Wijst
- Department of Genetics, 5th floor ERIBA building, Antonius Deusinglaan 1, 9713AV Groningen, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Dylan H de Vries
- Department of Genetics, 5th floor ERIBA building, Antonius Deusinglaan 1, 9713AV Groningen, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Harm Brugge
- Department of Genetics, 5th floor ERIBA building, Antonius Deusinglaan 1, 9713AV Groningen, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Harm-Jan Westra
- Department of Genetics, 5th floor ERIBA building, Antonius Deusinglaan 1, 9713AV Groningen, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Lude Franke
- Department of Genetics, 5th floor ERIBA building, Antonius Deusinglaan 1, 9713AV Groningen, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands.
| |
Collapse
|
11
|
Bisogni AJ, Ghazanfar S, Williams EO, Marsh HM, Yang JYH, Lin DM. Tuning of delta-protocadherin adhesion through combinatorial diversity. eLife 2018; 7:e41050. [PMID: 30547884 PMCID: PMC6326727 DOI: 10.7554/elife.41050] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Accepted: 12/11/2018] [Indexed: 12/21/2022] Open
Abstract
The delta-protocadherins (δ-Pcdhs) play key roles in neural development, and expression studies suggest they are expressed in combination within neurons. The extent of this combinatorial diversity, and how these combinations influence cell adhesion, is poorly understood. We show that individual mouse olfactory sensory neurons express 0-7 δ-Pcdhs. Despite this apparent combinatorial complexity, K562 cell aggregation assays revealed simple principles that mediate tuning of δ-Pcdh adhesion. Cells can vary the number of δ-Pcdhs expressed, the level of surface expression, and which δ-Pcdhs are expressed, as different members possess distinct apparent adhesive affinities. These principles contrast with those identified previously for the clustered protocadherins (cPcdhs), where the particular combination of cPcdhs expressed does not appear to be a critical factor. Despite these differences, we show δ-Pcdhs can modify cPcdh adhesion. Our studies show how intra- and interfamily interactions can greatly amplify the impact of this small subfamily on neuronal function.
Collapse
Affiliation(s)
- Adam J Bisogni
- Department of Biomedical SciencesCornell UniversityIthacaUnited States
| | - Shila Ghazanfar
- School of Mathematics and StatisticsThe University of SydneySydneyAustralia
| | - Eric O Williams
- Department of Biomedical SciencesCornell UniversityIthacaUnited States
- Department of Biology and ChemistryFitchburg State UniversityFitchburgUnited States
| | - Heather M Marsh
- Department of Biomedical SciencesCornell UniversityIthacaUnited States
| | - Jean YH Yang
- School of Mathematics and StatisticsThe University of SydneySydneyAustralia
| | - David M Lin
- Department of Biomedical SciencesCornell UniversityIthacaUnited States
| |
Collapse
|
12
|
Chen S, Mar JC. Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data. BMC Bioinformatics 2018; 19:232. [PMID: 29914350 PMCID: PMC6006753 DOI: 10.1186/s12859-018-2217-z] [Citation(s) in RCA: 129] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2017] [Accepted: 05/24/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A fundamental fact in biology states that genes do not operate in isolation, and yet, methods that infer regulatory networks for single cell gene expression data have been slow to emerge. With single cell sequencing methods now becoming accessible, general network inference algorithms that were initially developed for data collected from bulk samples may not be suitable for single cells. Meanwhile, although methods that are specific for single cell data are now emerging, whether they have improved performance over general methods is unknown. In this study, we evaluate the applicability of five general methods and three single cell methods for inferring gene regulatory networks from both experimental single cell gene expression data and in silico simulated data. RESULTS Standard evaluation metrics using ROC curves and Precision-Recall curves against reference sets sourced from the literature demonstrated that most of the methods performed poorly when they were applied to either experimental single cell data, or simulated single cell data, which demonstrates their lack of performance for this task. Using default settings, network methods were applied to the same datasets. Comparisons of the learned networks highlighted the uniqueness of some predicted edges for each method. The fact that different methods infer networks that vary substantially reflects the underlying mathematical rationale and assumptions that distinguish network methods from each other. CONCLUSIONS This study provides a comprehensive evaluation of network modeling algorithms applied to experimental single cell gene expression data and in silico simulated datasets where the network structure is known. Comparisons demonstrate that most of these assessed network methods are not able to predict network structures from single cell expression data accurately, even if they are specifically developed for single cell methods. Also, single cell methods, which usually depend on more elaborative algorithms, in general have less similarity to each other in the sets of edges detected. The results from this study emphasize the importance for developing more accurate optimized network modeling methods that are compatible for single cell data. Newly-developed single cell methods may uniquely capture particular features of potential gene-gene relationships, and caution should be taken when we interpret these results.
Collapse
Affiliation(s)
- Shuonan Chen
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Jessica C Mar
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA. .,Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, New York, USA. .,Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, QLD, Australia.
| |
Collapse
|
13
|
Abstract
The emerging single-cell RNA sequencing (scRNA-seq) technologies enable the investigation of transcriptomic landscapes at the single-cell resolution. ScRNA-seq data analysis is complicated by excess zero counts, the so-called dropouts due to low amounts of mRNA sequenced within individual cells. We introduce scImpute, a statistical method to accurately and robustly impute the dropouts in scRNA-seq data. scImpute automatically identifies likely dropouts, and only perform imputation on these values without introducing new biases to the rest data. scImpute also detects outlier cells and excludes them from imputation. Evaluation based on both simulated and real human and mouse scRNA-seq data suggests that scImpute is an effective tool to recover transcriptome dynamics masked by dropouts. scImpute is shown to identify likely dropouts, enhance the clustering of cell subpopulations, improve the accuracy of differential expression analysis, and aid the study of gene expression dynamics. Despite being widely performed in exploring cell heterogeneity and gene expression stochasticity, single cell RNA-seq analysis is complicated by excess zero counts (dropouts). Here, Li and Li develop scImpute for statistical imputation of dropouts in scRNA-seq data.
Collapse
Affiliation(s)
- Wei Vivian Li
- Department of Statistics, University of California, Los Angeles, CA, 90095-1554, USA
| | - Jingyi Jessica Li
- Department of Statistics, University of California, Los Angeles, CA, 90095-1554, USA. .,Department of Human Genetics, University of California, Los Angeles, CA, 90095-7088, USA.
| |
Collapse
|
14
|
Herbach U, Bonnaffoux A, Espinasse T, Gandrillon O. Inferring gene regulatory networks from single-cell data: a mechanistic approach. BMC SYSTEMS BIOLOGY 2017; 11:105. [PMID: 29157246 PMCID: PMC5697158 DOI: 10.1186/s12918-017-0487-0] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Accepted: 11/09/2017] [Indexed: 01/13/2023]
Abstract
Background The recent development of single-cell transcriptomics has enabled gene expression to be measured in individual cells instead of being population-averaged. Despite this considerable precision improvement, inferring regulatory networks remains challenging because stochasticity now proves to play a fundamental role in gene expression. In particular, mRNA synthesis is now acknowledged to occur in a highly bursty manner. Results We propose to view the inference problem as a fitting procedure for a mechanistic gene network model that is inherently stochastic and takes not only protein, but also mRNA levels into account. We first explain how to build and simulate this network model based upon the coupling of genes that are described as piecewise-deterministic Markov processes. Our model is modular and can be used to implement various biochemical hypotheses including causal interactions between genes. However, a naive fitting procedure would be intractable. By performing a relevant approximation of the stationary distribution, we derive a tractable procedure that corresponds to a statistical hidden Markov model with interpretable parameters. This approximation turns out to be extremely close to the theoretical distribution in the case of a simple toggle-switch, and we show that it can indeed fit real single-cell data. As a first step toward inference, our approach was applied to a number of simple two-gene networks simulated in silico from the mechanistic model and satisfactorily recovered the original networks. Conclusions Our results demonstrate that functional interactions between genes can be inferred from the distribution of a mechanistic, dynamical stochastic model that is able to describe gene expression in individual cells. This approach seems promising in relation to the current explosion of single-cell expression data. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0487-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ulysse Herbach
- Univ Lyon, ENS de Lyon, Univ Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, 46 allée d'Italie Site Jacques Monod, Lyon, F-69007, France.,Inria Team Dracula, Inria Center Grenoble Rhône-Alpes, Lyon, France.,Univ Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5208, Institut Camille Jordan, 43 blvd. du 11 novembre 1918, Villeurbanne Cedex, F-6962, France
| | - Arnaud Bonnaffoux
- Univ Lyon, ENS de Lyon, Univ Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, 46 allée d'Italie Site Jacques Monod, Lyon, F-69007, France.,Inria Team Dracula, Inria Center Grenoble Rhône-Alpes, Lyon, France.,The CoSMo company, 5 passage du Vercors, Lyon, 69007, France
| | - Thibault Espinasse
- Univ Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5208, Institut Camille Jordan, 43 blvd. du 11 novembre 1918, Villeurbanne Cedex, F-6962, France
| | - Olivier Gandrillon
- Univ Lyon, ENS de Lyon, Univ Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, 46 allée d'Italie Site Jacques Monod, Lyon, F-69007, France. .,Inria Team Dracula, Inria Center Grenoble Rhône-Alpes, Lyon, France.
| |
Collapse
|
15
|
Schönbach C, Verma C, Bond PJ, Ranganathan S. Bioinformatics and systems biology research update from the 15 th International Conference on Bioinformatics (InCoB2016). BMC Bioinformatics 2016; 17:524. [PMID: 28155668 PMCID: PMC5259976 DOI: 10.1186/s12859-016-1409-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
The International Conference on Bioinformatics (InCoB) has been publishing peer-reviewed conference papers in BMC Bioinformatics since 2006. Of the 44 articles accepted for publication in supplement issues of BMC Bioinformatics, BMC Genomics, BMC Medical Genomics and BMC Systems Biology, 24 articles with a bioinformatics or systems biology focus are reviewed in this editorial. InCoB2017 is scheduled to be held in Shenzen, China, September 20-22, 2017.
Collapse
Affiliation(s)
- Christian Schönbach
- International Research Center for Medical Sciences, Graduate School of Medical Sciences, Kumamoto University, Kumamoto, 860-0811 Japan
| | - Chandra Verma
- Bioinformatics Institute, Agency for Science, Technology and Research (A∗STAR), Singapore, 138671 Singapore
| | - Peter J. Bond
- Bioinformatics Institute, Agency for Science, Technology and Research (A∗STAR), Singapore, 138671 Singapore
| | - Shoba Ranganathan
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW 2109 Australia
| |
Collapse
|