101
|
Ventre E, Herbach U, Espinasse T, Benoit G, Gandrillon O. One model fits all: Combining inference and simulation of gene regulatory networks. PLoS Comput Biol 2023; 19:e1010962. [PMID: 36972296 PMCID: PMC10079230 DOI: 10.1371/journal.pcbi.1010962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 04/06/2023] [Accepted: 02/17/2023] [Indexed: 03/29/2023] Open
Abstract
The rise of single-cell data highlights the need for a nondeterministic view of gene expression, while offering new opportunities regarding gene regulatory network inference. We recently introduced two strategies that specifically exploit time-course data, where single-cell profiling is performed after a stimulus: HARISSA, a mechanistic network model with a highly efficient simulation procedure, and CARDAMOM, a scalable inference method seen as model calibration. Here, we combine the two approaches and show that the same model driven by transcriptional bursting can be used simultaneously as an inference tool, to reconstruct biologically relevant networks, and as a simulation tool, to generate realistic transcriptional profiles emerging from gene interactions. We verify that CARDAMOM quantitatively reconstructs causal links when the data is simulated from HARISSA, and demonstrate its performance on experimental data collected on in vitro differentiating mouse embryonic stem cells. Overall, this integrated strategy largely overcomes the limitations of disconnected inference and simulation.
Collapse
Affiliation(s)
- Elias Ventre
- Laboratoire de Biologie et Modélisation de la Cellule, École Normale Supérieure de Lyon, CNRS, UMR 5239, Inserm, U1293, Université Claude Bernard Lyon 1, Lyon, France
- Inria Center Grenoble Rhône-Alpes, Équipe Dracula, Villeurbanne, France
- Univ Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5208, Institut Camille Jordan, Villeurbanne, France
| | - Ulysse Herbach
- Université de Lorraine, CNRS, Inria, IECL, Nancy, France
| | - Thibault Espinasse
- Inria Center Grenoble Rhône-Alpes, Équipe Dracula, Villeurbanne, France
- Univ Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5208, Institut Camille Jordan, Villeurbanne, France
| | - Gérard Benoit
- Laboratoire de Biologie et Modélisation de la Cellule, École Normale Supérieure de Lyon, CNRS, UMR 5239, Inserm, U1293, Université Claude Bernard Lyon 1, Lyon, France
| | - Olivier Gandrillon
- Laboratoire de Biologie et Modélisation de la Cellule, École Normale Supérieure de Lyon, CNRS, UMR 5239, Inserm, U1293, Université Claude Bernard Lyon 1, Lyon, France
- Inria Center Grenoble Rhône-Alpes, Équipe Dracula, Villeurbanne, France
| |
Collapse
|
102
|
Oubounyt M, Elkjaer ML, Laske T, Grønning AB, Moeller M, Baumbach J. De-novo reconstruction and identification of transcriptional gene regulatory network modules differentiating single-cell clusters. NAR Genom Bioinform 2023; 5:lqad018. [PMID: 36879901 PMCID: PMC9985332 DOI: 10.1093/nargab/lqad018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 01/16/2023] [Accepted: 02/09/2023] [Indexed: 03/07/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) technology provides an unprecedented opportunity to understand gene functions and interactions at single-cell resolution. While computational tools for scRNA-seq data analysis to decipher differential gene expression profiles and differential pathway expression exist, we still lack methods to learn differential regulatory disease mechanisms directly from the single-cell data. Here, we provide a new methodology, named DiNiro, to unravel such mechanisms de novo and report them as small, easily interpretable transcriptional regulatory network modules. We demonstrate that DiNiro is able to uncover novel, relevant, and deep mechanistic models that not just predict but explain differential cellular gene expression programs. DiNiro is available at https://exbio.wzw.tum.de/diniro/.
Collapse
Affiliation(s)
- Mhaned Oubounyt
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Maria L Elkjaer
- Department of Neurology, Odense University Hospital, Odense, Denmark
- Institute of Clinical Research, University of Southern Denmark, Odense, Denmark
- Institute of Molecular Medicine, University of Southern Denmark, Odense, Denmark
| | - Tanja Laske
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Alexander G B Grønning
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Marcus J Moeller
- Heisenberg Chair of Preventive and Translational Nephrology, Department of Nephrology, Rheumatology and Clinical Immunology, RWTH Aachen University, Aachen, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
103
|
van der Sande M, Frölich S, van Heeringen SJ. Computational approaches to understand transcription regulation in development. Biochem Soc Trans 2023; 51:1-12. [PMID: 36695505 PMCID: PMC9988001 DOI: 10.1042/bst20210145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 01/07/2023] [Accepted: 01/13/2023] [Indexed: 01/26/2023]
Abstract
Gene regulatory networks (GRNs) serve as useful abstractions to understand transcriptional dynamics in developmental systems. Computational prediction of GRNs has been successfully applied to genome-wide gene expression measurements with the advent of microarrays and RNA-sequencing. However, these inferred networks are inaccurate and mostly based on correlative rather than causative interactions. In this review, we highlight three approaches that significantly impact GRN inference: (1) moving from one genome-wide functional modality, gene expression, to multi-omics, (2) single cell sequencing, to measure cell type-specific signals and predict context-specific GRNs, and (3) neural networks as flexible models. Together, these experimental and computational developments have the potential to significantly impact the quality of inferred GRNs. Ultimately, accurately modeling the regulatory interactions between transcription factors and their target genes will be essential to understand the role of transcription factors in driving developmental gene expression programs and to derive testable hypotheses for validation.
Collapse
Affiliation(s)
| | | | - Simon J. van Heeringen
- Radboud University, Department of Molecular Developmental Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, 6525GA Nijmegen, The Netherlands
| |
Collapse
|
104
|
Song Q, Ruffalo M, Bar-Joseph Z. Using single cell atlas data to reconstruct regulatory networks. Nucleic Acids Res 2023; 51:e38. [PMID: 36762475 PMCID: PMC10123116 DOI: 10.1093/nar/gkad053] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Revised: 12/16/2022] [Accepted: 01/19/2023] [Indexed: 02/11/2023] Open
Abstract
Inference of global gene regulatory networks from omics data is a long-term goal of systems biology. Most methods developed for inferring transcription factor (TF)-gene interactions either relied on a small dataset or used snapshot data which is not suitable for inferring a process that is inherently temporal. Here, we developed a new computational method that combines neural networks and multi-task learning to predict RNA velocity rather than gene expression values. This allows our method to overcome many of the problems faced by prior methods leading to more accurate and more comprehensive set of identified regulatory interactions. Application of our method to atlas scale single cell data from 6 HuBMAP tissues led to several validated and novel predictions and greatly improved on prior methods proposed for this task.
Collapse
Affiliation(s)
- Qi Song
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Matthew Ruffalo
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Ziv Bar-Joseph
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.,Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
105
|
Khozyainova AA, Valyaeva AA, Arbatsky MS, Isaev SV, Iamshchikov PS, Volchkov EV, Sabirov MS, Zainullina VR, Chechekhin VI, Vorobev RS, Menyailo ME, Tyurin-Kuzmin PA, Denisov EV. Complex Analysis of Single-Cell RNA Sequencing Data. BIOCHEMISTRY. BIOKHIMIIA 2023; 88:231-252. [PMID: 37072324 PMCID: PMC10000364 DOI: 10.1134/s0006297923020074] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 12/13/2022] [Accepted: 12/13/2022] [Indexed: 03/12/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) is a revolutionary tool for studying the physiology of normal and pathologically altered tissues. This approach provides information about molecular features (gene expression, mutations, chromatin accessibility, etc.) of cells, opens up the possibility to analyze the trajectories/phylogeny of cell differentiation and cell-cell interactions, and helps in discovery of new cell types and previously unexplored processes. From a clinical point of view, scRNA-seq facilitates deeper and more detailed analysis of molecular mechanisms of diseases and serves as a basis for the development of new preventive, diagnostic, and therapeutic strategies. The review describes different approaches to the analysis of scRNA-seq data, discusses the advantages and disadvantages of bioinformatics tools, provides recommendations and examples of their successful use, and suggests potential directions for improvement. We also emphasize the need for creating new protocols, including multiomics ones, for the preparation of DNA/RNA libraries of single cells with the purpose of more complete understanding of individual cells.
Collapse
Affiliation(s)
- Anna A Khozyainova
- Laboratory of Cancer Progression Biology, Cancer Research Institute, Tomsk National Research Medical Center, Russian Academy of Sciences, Tomsk, 634050, Russia.
| | - Anna A Valyaeva
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, 119991, Russia
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Mikhail S Arbatsky
- Laboratory of Artificial Intelligence and Bioinformatics, The Russian Clinical Research Center for Gerontology, Pirogov Russian National Medical University, Moscow, 129226, Russia
- School of Public Administration, Lomonosov Moscow State University, Moscow, 119991, Russia
- Faculty of Fundamental Medicine, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Sergey V Isaev
- Research Institute of Personalized Medicine, National Center for Personalized Medicine of Endocrine Diseases, National Medical Research Center for Endocrinology, Moscow, 117036, Russia
| | - Pavel S Iamshchikov
- Laboratory of Cancer Progression Biology, Cancer Research Institute, Tomsk National Research Medical Center, Russian Academy of Sciences, Tomsk, 634050, Russia
- Laboratory of Complex Analysis of Big Bioimage Data, National Research Tomsk State University, Tomsk, 634050, Russia
| | - Egor V Volchkov
- Department of Oncohematology, Dmitry Rogachev National Research Center of Pediatric Hematology, Oncology and Immunology, Moscow, 117198, Russia
| | - Marat S Sabirov
- Laboratory of Bioinformatics and Molecular Genetics, Koltzov Institute of Developmental Biology of the Russian Academy of Sciences, Moscow, 119334, Russia
| | - Viktoria R Zainullina
- Laboratory of Cancer Progression Biology, Cancer Research Institute, Tomsk National Research Medical Center, Russian Academy of Sciences, Tomsk, 634050, Russia
| | - Vadim I Chechekhin
- Faculty of Fundamental Medicine, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Rostislav S Vorobev
- Laboratory of Cancer Progression Biology, Cancer Research Institute, Tomsk National Research Medical Center, Russian Academy of Sciences, Tomsk, 634050, Russia
| | - Maxim E Menyailo
- Laboratory of Cancer Progression Biology, Cancer Research Institute, Tomsk National Research Medical Center, Russian Academy of Sciences, Tomsk, 634050, Russia
| | - Pyotr A Tyurin-Kuzmin
- Faculty of Fundamental Medicine, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Evgeny V Denisov
- Laboratory of Cancer Progression Biology, Cancer Research Institute, Tomsk National Research Medical Center, Russian Academy of Sciences, Tomsk, 634050, Russia
| |
Collapse
|
106
|
Juan H, Huang H. Quantitative analysis of high‐throughput biological data. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2023. [DOI: 10.1002/wcms.1658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Affiliation(s)
- Hsueh‐Fen Juan
- Department of Life Science, Institute of Biomedical Electronics and Bioinformatics, and Center for Systems Biology National Taiwan University Taipei Taiwan
- Taiwan AI Labs Taipei Taiwan
| | - Hsuan‐Cheng Huang
- Institute of Biomedical Informatics National Yang Ming Chiao Tung University Taipei Taiwan
| |
Collapse
|
107
|
Zhang Y, Wang M, Wang Z, Liu Y, Xiong S, Zou Q. MetaSEM: Gene Regulatory Network Inference from Single-Cell RNA Data by Meta-Learning. Int J Mol Sci 2023; 24:2595. [PMID: 36768917 PMCID: PMC9916710 DOI: 10.3390/ijms24032595] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 01/23/2023] [Accepted: 01/26/2023] [Indexed: 01/31/2023] Open
Abstract
Regulators in gene regulatory networks (GRNs) are crucial for identifying cell states. However, GRN inference based on scRNA-seq data has several problems, including high dimensionality and sparsity, and requires more label data. Therefore, we propose a meta-learning GRN inference framework to identify regulatory factors. Specifically, meta-learning solves the parameter optimization problem caused by high-dimensional sparse data features. In addition, a few-shot solution was used to solve the problem of lack of label data. A structural equation model (SEM) was embedded in the model to identify important regulators. We integrated the parameter optimization strategy into the bi-level optimization to extract the feature consistent with GRN reasoning. This unique design makes our model robust to small-scale data. By studying the GRN inference task, we confirmed that the selected regulators were closely related to gene expression specificity. We further analyzed the GRN inferred to find the important regulators in cell type identification. Extensive experimental results showed that our model effectively captured the regulator in single-cell GRN inference. Finally, the visualization results verified the importance of the selected regulators for cell type recognition.
Collapse
Affiliation(s)
- Yongqing Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Maocheng Wang
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Zixuan Wang
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Yuhang Liu
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Shuwen Xiong
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610051, China
| |
Collapse
|
108
|
Lin Z, Ou-Yang L. Inferring gene regulatory networks from single-cell gene expression data via deep multi-view contrastive learning. Brief Bioinform 2023; 24:6965907. [PMID: 36585783 DOI: 10.1093/bib/bbac586] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 11/28/2022] [Accepted: 11/29/2022] [Indexed: 01/01/2023] Open
Abstract
The inference of gene regulatory networks (GRNs) is of great importance for understanding the complex regulatory mechanisms within cells. The emergence of single-cell RNA-sequencing (scRNA-seq) technologies enables the measure of gene expression levels for individual cells, which promotes the reconstruction of GRNs at single-cell resolution. However, existing network inference methods are mainly designed for data collected from a single data source, which ignores the information provided by multiple related data sources. In this paper, we propose a multi-view contrastive learning (DeepMCL) model to infer GRNs from scRNA-seq data collected from multiple data sources or time points. We first represent each gene pair as a set of histogram images, and then introduce a deep Siamese convolutional neural network with contrastive loss to learn the low-dimensional embedding for each gene pair. Moreover, an attention mechanism is introduced to integrate the embeddings extracted from different data sources and different neighbor gene pairs. Experimental results on synthetic and real-world datasets validate the effectiveness of our contrastive learning and attention mechanisms, demonstrating the effectiveness of our model in integrating multiple data sources for GRN inference.
Collapse
Affiliation(s)
- Zerun Lin
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Le Ou-Yang
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
| |
Collapse
|
109
|
Caranica C, Lu M. A data-driven optimization method for coarse-graining gene regulatory networks. iScience 2023; 26:105927. [PMID: 36698721 PMCID: PMC9868542 DOI: 10.1016/j.isci.2023.105927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 12/19/2022] [Accepted: 01/03/2023] [Indexed: 01/06/2023] Open
Abstract
One major challenge in systems biology is to understand how various genes in a gene regulatory network (GRN) collectively perform their functions and control network dynamics. This task becomes extremely hard to tackle in the case of large networks with hundreds of genes and edges, many of which have redundant regulatory roles and functions. The existing methods for model reduction usually require the detailed mathematical description of dynamical systems and their corresponding kinetic parameters, which are often not available. Here, we present a data-driven method for coarse-graining large GRNs, named SacoGraci, using ensemble-based mathematical modeling, dimensionality reduction, and gene circuit optimization by Markov Chain Monte Carlo methods. SacoGraci requires network topology as the only input and is robust against errors in GRNs. We benchmark and demonstrate its usage with synthetic, literature-based, and bioinformatics-derived GRNs. We hope SacoGraci will enhance our ability to model the gene regulation of complex biological systems.
Collapse
Affiliation(s)
- Cristian Caranica
- Department of Bioengineering, Northeastern University, Boston, MA 02115, USA,Center for Theoretical Biological Physics, Northeastern University, Boston, MA 02115, USA
| | - Mingyang Lu
- Department of Bioengineering, Northeastern University, Boston, MA 02115, USA,Center for Theoretical Biological Physics, Northeastern University, Boston, MA 02115, USA,The Jackson Laboratory, Bar Harbor, ME 04609, USA,Corresponding author
| |
Collapse
|
110
|
Koumadorakis DE, Krokidis MG, Dimitrakopoulos GN, Vrahatis AG. A Consensus Gene Regulatory Network for Neurodegenerative Diseases Using Single-Cell RNA-Seq Data. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2023; 1423:215-224. [PMID: 37525047 DOI: 10.1007/978-3-031-31978-5_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/02/2023]
Abstract
Gene regulatory network (GRN) inference from gene expression data is a highly complex and challenging task in systems biology. Despite the challenges, GRNs have emerged, and for complex diseases such as neurodegenerative diseases, they have the potential to provide vital information and identify key regulators. However, every GRN method produced predicts results based on its assumptions, providing limited biological insights. For that reason, the current work focused on the development of an ensemble method from individual GRN methods to address this issue. Four state-of-the-art GRN algorithms were selected to form a consensus GRN from their common gene interactions. Each algorithm uses a different construction method, and for a more robust behavior, both static and dynamic methods were selected as well. The algorithms were applied to a scRNA-seq dataset from the CK-p25 mus musculus model during neurodegeneration. The top subnetworks were constructed from the consensus network, and potential key regulators were identified. The results also demonstrated the overlap between the algorithms for the current dataset and the necessity for an ensemble approach. This work aims to demonstrate the creation of an ensemble network and provide insights into whether a combination of different GRN methods can produce valuable results.
Collapse
Affiliation(s)
- Dimitrios E Koumadorakis
- Bioinformatics and Human Electrophysiology Lab (BiHELab), Department of Informatics, Ionian University, Corfu, Greece
| | - Marios G Krokidis
- Bioinformatics and Human Electrophysiology Lab (BiHELab), Department of Informatics, Ionian University, Corfu, Greece
| | - Georgios N Dimitrakopoulos
- Bioinformatics and Human Electrophysiology Lab (BiHELab), Department of Informatics, Ionian University, Corfu, Greece
| | - Aristidis G Vrahatis
- Bioinformatics and Human Electrophysiology Lab (BiHELab), Department of Informatics, Ionian University, Corfu, Greece
| |
Collapse
|
111
|
Abstract
One of the key questions in systems biology is to understand the roles of gene regulatory circuits in determining cellular states and their functions. In previous studies, some researchers have inferred large gene networks from genome wide genomics/transcriptomics data using the top-down approach, while others have modeled core gene circuits of small sizes using the bottom-up approach. Despite many existing systems biology studies, there is still no general rule on what sizes of gene networks and what types of circuit motifs a system would need to achieve robust biological functions. Here, we adopt a gene circuit motif analysis to discover four-node circuits responsible for multiplicity (rich in dynamical behavior), flexibility (versatile to alter gene expression), or both. We identify the most reoccurring two-node circuit motifs and the co-occurring motif pairs. Furthermore, we investigate the contributing factors of multiplicity and flexibility for large gene networks of different types and sizes. We find that gene networks of intermediate sizes tend to have combined high levels of multiplicity and flexibility. Our study will contribute to a better understanding of the dynamical mechanisms of gene regulatory circuits and provide insights into rational designs of robust gene circuits in synthetic and systems biology.
Collapse
Affiliation(s)
- Lijia Huang
- Center for Theoretical Biological Physics, Northeastern University, Boston, MA 02115, USA
- Department of Bioengineering, Northeastern University, Boston, MA 02115, USA
| | - Benjamin Clauss
- Center for Theoretical Biological Physics, Northeastern University, Boston, MA 02115, USA
- Genetics Program, Graduate School of Biomedical Sciences, Tufts University, Boston, MA 02111, USA
| | - Mingyang Lu
- Center for Theoretical Biological Physics, Northeastern University, Boston, MA 02115, USA
- Department of Bioengineering, Northeastern University, Boston, MA 02115, USA
- Genetics Program, Graduate School of Biomedical Sciences, Tufts University, Boston, MA 02111, USA
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
| |
Collapse
|
112
|
Coulier A, Singh P, Sturrock M, Hellander A. Systematic comparison of modeling fidelity levels and parameter inference settings applied to negative feedback gene regulation. PLoS Comput Biol 2022; 18:e1010683. [PMID: 36520957 PMCID: PMC9799300 DOI: 10.1371/journal.pcbi.1010683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 12/29/2022] [Accepted: 10/25/2022] [Indexed: 12/23/2022] Open
Abstract
Quantitative stochastic models of gene regulatory networks are important tools for studying cellular regulation. Such models can be formulated at many different levels of fidelity. A practical challenge is to determine what model fidelity to use in order to get accurate and representative results. The choice is important, because models of successively higher fidelity come at a rapidly increasing computational cost. In some situations, the level of detail is clearly motivated by the question under study. In many situations however, many model options could qualitatively agree with available data, depending on the amount of data and the nature of the observations. Here, an important distinction is whether we are interested in inferring the true (but unknown) physical parameters of the model or if it is sufficient to be able to capture and explain available data. The situation becomes complicated from a computational perspective because inference needs to be approximate. Most often it is based on likelihood-free Approximate Bayesian Computation (ABC) and here determining which summary statistics to use, as well as how much data is needed to reach the desired level of accuracy, are difficult tasks. Ultimately, all of these aspects-the model fidelity, the available data, and the numerical choices for inference-interplay in a complex manner. In this paper we develop a computational pipeline designed to systematically evaluate inference accuracy for a wide range of true known parameters. We then use it to explore inference settings for negative feedback gene regulation. In particular, we compare a detailed spatial stochastic model, a coarse-grained compartment-based multiscale model, and the standard well-mixed model, across several data-scenarios and for multiple numerical options for parameter inference. Practically speaking, this pipeline can be used as a preliminary step to guide modelers prior to gathering experimental data. By training Gaussian processes to approximate the distance function values, we are able to substantially reduce the computational cost of running the pipeline.
Collapse
Affiliation(s)
- Adrien Coulier
- Department of Information Technology, Uppsala University, Uppsala, Sweden
| | - Prashant Singh
- Science for Life Laboratory, Department of Information Technology, Uppsala University, Uppsala, Sweden
| | - Marc Sturrock
- Department of Physiology, Royal College of Surgeons in Ireland, Dublin, Ireland
| | - Andreas Hellander
- Department of Information Technology, Uppsala University, Uppsala, Sweden
- * E-mail:
| |
Collapse
|
113
|
Chuwdhury GS, Ng IOL, Ho DWH. scAnalyzeR: A Comprehensive Software Package With Graphical User Interface for Single-Cell RNA Sequencing Analysis and its Application on Liver Cancer. Technol Cancer Res Treat 2022; 21:15330338221142729. [PMID: 36476060 PMCID: PMC9742707 DOI: 10.1177/15330338221142729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Introduction: The application of single-cell RNA sequencing to delineate tissue heterogeneity and complexity has become increasingly popular. Given its tremendous resolution and high-dimensional capacity for in-depth investigation, single-cell RNA sequencing offers an unprecedented research power. Although some popular software packages are available for single-cell RNA sequencing data analysis and visualization, it is still a big challenge for their usage, as they provide only a command-line interface and require significant level of bioinformatics skills. Methods: We have developed scAnalyzeR, which is a single-cell RNA sequencing analysis pipeline with an interactive and user-friendly graphical interface for analyzing and visualizing single-cell RNA sequencing data. It accepts single-cell RNA sequencing data from various technology platforms and different model organisms (human and mouse) and allows flexibility in input file format. It provides functionalities for data preprocessing, quality control, basic summary statistics, dimension reduction, unsupervised clustering, differential gene expression, gene set enrichment analysis, correlation analysis, pseudotime cell trajectory inference, and various visualization plots. It also provides default parameters for easy usage and allows a wide range of flexibility and optimization by accepting user-defined options. It has been developed as a docker image that can be run in any docker-supported environment including Linux, Mac, and Windows, without installing any dependencies. Results: We compared the performance of scAnalyzeR with 2 other graphical tools that are popular for analyzing single-cell RNA sequencing data. The comparison was based on the comprehensiveness of functionalities, ease of usage and flexibility, and execution time. In general, scAnalyzeR outperformed the other tested counterparts in various aspects, demonstrating its superior overall performance. To illustrate the usefulness of scAnalyzeR in cancer research, we have analyzed the in-house liver cancer single-cell RNA sequencing dataset. Liver cancer tumor cells were revealed to have multiple subpopulations with distinctive gene expression signatures. Conclusion: scAnalyzeR has comprehensive functionalities and demonstrated usability. We anticipate more functionalities to be adopted in the future development.
Collapse
Affiliation(s)
- GS Chuwdhury
- Department of Pathology and State Key Laboratory of Liver Research, The University of Hong Kong, Hong Kong
| | - Irene Oi-Lin Ng
- Department of Pathology and State Key Laboratory of Liver Research, The University of Hong Kong, Hong Kong
| | - Daniel Wai-Hung Ho
- Department of Pathology and State Key Laboratory of Liver Research, The University of Hong Kong, Hong Kong,Daniel Ho, Department of Pathology and State Key Laboratory of Liver Research, The University of Hong Kong, Hong Kong.
| |
Collapse
|
114
|
Su M, Pan T, Chen QZ, Zhou WW, Gong Y, Xu G, Yan HY, Li S, Shi QZ, Zhang Y, He X, Jiang CJ, Fan SC, Li X, Cairns MJ, Wang X, Li YS. Data analysis guidelines for single-cell RNA-seq in biomedical studies and clinical applications. Mil Med Res 2022; 9:68. [PMID: 36461064 PMCID: PMC9716519 DOI: 10.1186/s40779-022-00434-8] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 11/18/2022] [Indexed: 12/03/2022] Open
Abstract
The application of single-cell RNA sequencing (scRNA-seq) in biomedical research has advanced our understanding of the pathogenesis of disease and provided valuable insights into new diagnostic and therapeutic strategies. With the expansion of capacity for high-throughput scRNA-seq, including clinical samples, the analysis of these huge volumes of data has become a daunting prospect for researchers entering this field. Here, we review the workflow for typical scRNA-seq data analysis, covering raw data processing and quality control, basic data analysis applicable for almost all scRNA-seq data sets, and advanced data analysis that should be tailored to specific scientific questions. While summarizing the current methods for each analysis step, we also provide an online repository of software and wrapped-up scripts to support the implementation. Recommendations and caveats are pointed out for some specific analysis tasks and approaches. We hope this resource will be helpful to researchers engaging with scRNA-seq, in particular for emerging clinical applications.
Collapse
Affiliation(s)
- Min Su
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166 China
| | - Tao Pan
- College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199 Hainan China
| | - Qiu-Zhen Chen
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166 China
| | - Wei-Wei Zhou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081 Heilongjiang China
| | - Yi Gong
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166 China
- Department of Immunology, Nanjing Medical University, Nanjing, 211166 China
| | - Gang Xu
- College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199 Hainan China
| | - Huan-Yu Yan
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166 China
| | - Si Li
- College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199 Hainan China
| | - Qiao-Zhen Shi
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166 China
| | - Ya Zhang
- College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199 Hainan China
| | - Xiao He
- Department of Laboratory Medicine, Women and Children’s Hospital of Chongqing Medical University, Chongqing, 401174 China
| | | | - Shi-Cai Fan
- Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzhen, 518110 Guangdong China
| | - Xia Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081 Heilongjiang China
| | - Murray J. Cairns
- School of Biomedical Sciences and Pharmacy, Faculty of Health and Medicine, the University of Newcastle, University Drive, Callaghan, NSW 2308 Australia
- Precision Medicine Research Program, Hunter Medical Research Institute, New Lambton Heights, NSW 2305 Australia
| | - Xi Wang
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166 China
| | - Yong-Sheng Li
- College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199 Hainan China
| |
Collapse
|
115
|
Chen Z, King WC, Hwang A, Gerstein M, Zhang J. DeepVelo: Single-cell transcriptomic deep velocity field learning with neural ordinary differential equations. SCIENCE ADVANCES 2022; 8:eabq3745. [PMID: 36449617 PMCID: PMC9710871 DOI: 10.1126/sciadv.abq3745] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Recent advances in single-cell sequencing technologies have provided unprecedented opportunities to measure the gene expression profile and RNA velocity of individual cells. However, modeling transcriptional dynamics is computationally challenging because of the high-dimensional, sparse nature of the single-cell gene expression measurements and the nonlinear regulatory relationships. Here, we present DeepVelo, a neural network-based ordinary differential equation that can model complex transcriptome dynamics by describing continuous-time gene expression changes within individual cells. We apply DeepVelo to public datasets from different sequencing platforms to (i) formulate transcriptome dynamics on different time scales, (ii) measure the instability of cell states, and (iii) identify developmental driver genes via perturbation analysis. Benchmarking against the state-of-the-art methods shows that DeepVelo can learn a more accurate representation of the velocity field. Furthermore, our perturbation studies reveal that single-cell dynamical systems could exhibit chaotic properties. In summary, DeepVelo allows data-driven discoveries of differential equations that delineate single-cell transcriptome dynamics.
Collapse
Affiliation(s)
- Zhanlin Chen
- Department of Statistics and Data Science, Yale University, New Haven, CT 06520, USA
| | - William C. King
- Healthcare and Life Sciences, Microsoft, Redmond, WA 98052, USA
| | - Aheyon Hwang
- Mathematical, Computational, and Systems Biology, University of California, Irvine, Irvine, CA 92697, USA
| | - Mark Gerstein
- Department of Statistics and Data Science, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, New Haven, CT 06520, USA
- Corresponding author. (M.G.); (J.Z.)
| | - Jing Zhang
- Department of Computer Science, University of California, Irvine, Irvine, CA 92697, USA
- Corresponding author. (M.G.); (J.Z.)
| |
Collapse
|
116
|
Guan J, Wang Y, Wang Y, Zhuang Y, Ji G. SRGS: sparse partial least squares-based recursive gene selection for gene regulatory network inference. BMC Genomics 2022; 23:782. [PMID: 36451086 PMCID: PMC9710113 DOI: 10.1186/s12864-022-09020-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 11/16/2022] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND The identification of gene regulatory networks (GRNs) facilitates the understanding of the underlying molecular mechanism of various biological processes and complex diseases. With the availability of single-cell RNA sequencing data, it is essential to infer GRNs from single-cell expression. Although some GRN methods originally developed for bulk expression data can be applicable to single-cell data and several single-cell specific GRN algorithms were developed, recent benchmarking studies have emphasized the need of developing more accurate and robust GRN modeling methods that are compatible for single-cell expression data. RESULTS We present SRGS, SPLS (sparse partial least squares)-based recursive gene selection, to infer GRNs from bulk or single-cell expression data. SRGS recursively selects and scores the genes which may have regulations on the considered target gene based on SPLS. When dealing with gene expression data with dropouts, we randomly scramble samples, set some values in the expression matrix to zeroes, and generate multiple copies of data through multiple iterations to make SRGS more robust. We test SRGS on different kinds of expression data, including simulated bulk data, simulated single-cell data without and with dropouts, and experimental single-cell data, and also compared with the existing GRN methods, including the ones originally developed for bulk data, the ones developed specifically for single-cell data, and even the ones recommended by recent benchmarking studies. CONCLUSIONS It has been shown that SRGS is competitive with the existing GRN methods and effective in the gene regulatory network inference from bulk or single-cell gene expression data. SRGS is available at: https://github.com/JGuan-lab/SRGS .
Collapse
Affiliation(s)
- Jinting Guan
- grid.12955.3a0000 0001 2264 7233Department of Automation, Xiamen University, Xiamen, Fujian China ,grid.12955.3a0000 0001 2264 7233National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian China
| | - Yang Wang
- grid.12955.3a0000 0001 2264 7233Department of Automation, Xiamen University, Xiamen, Fujian China
| | - Yongjie Wang
- grid.12955.3a0000 0001 2264 7233Department of Automation, Xiamen University, Xiamen, Fujian China
| | - Yan Zhuang
- grid.12955.3a0000 0001 2264 7233Department of Automation, Xiamen University, Xiamen, Fujian China
| | - Guoli Ji
- grid.12955.3a0000 0001 2264 7233Department of Automation, Xiamen University, Xiamen, Fujian China ,grid.12955.3a0000 0001 2264 7233National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian China
| |
Collapse
|
117
|
Mao G, Zeng R, Peng J, Zuo K, Pang Z, Liu J. Reconstructing gene regulatory networks of biological function using differential equations of multilayer perceptrons. BMC Bioinformatics 2022; 23:503. [PMID: 36434499 PMCID: PMC9700916 DOI: 10.1186/s12859-022-05055-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Accepted: 11/14/2022] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Building biological networks with a certain function is a challenge in systems biology. For the functionality of small (less than ten nodes) biological networks, most methods are implemented by exhausting all possible network topological spaces. This exhaustive approach is difficult to scale to large-scale biological networks. And regulatory relationships are complex and often nonlinear or non-monotonic, which makes inference using linear models challenging. RESULTS In this paper, we propose a multi-layer perceptron-based differential equation method, which operates by training a fully connected neural network (NN) to simulate the transcription rate of genes in traditional differential equations. We verify whether the regulatory network constructed by the NN method can continue to achieve the expected biological function by verifying the degree of overlap between the regulatory network discovered by NN and the regulatory network constructed by the Hill function. And we validate our approach by adapting to noise signals, regulator knockout, and constructing large-scale gene regulatory networks using link-knockout techniques. We apply a real dataset (the mesoderm inducer Xenopus Brachyury expression) to construct the core topology of the gene regulatory network and find that Xbra is only strongly expressed at moderate levels of activin signaling. CONCLUSION We have demonstrated from the results that this method has the ability to identify the underlying network topology and functional mechanisms, and can also be applied to larger and more complex gene network topologies.
Collapse
Affiliation(s)
- Guo Mao
- grid.412110.70000 0000 9548 2110Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Deya Road, Changsha, 410073 China
| | - Ruigeng Zeng
- grid.412110.70000 0000 9548 2110Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Deya Road, Changsha, 410073 China
| | - Jintao Peng
- grid.412110.70000 0000 9548 2110Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Deya Road, Changsha, 410073 China
| | - Ke Zuo
- grid.412110.70000 0000 9548 2110Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Deya Road, Changsha, 410073 China
| | - Zhengbin Pang
- grid.412110.70000 0000 9548 2110Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Deya Road, Changsha, 410073 China
| | - Jie Liu
- grid.412110.70000 0000 9548 2110Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Deya Road, Changsha, 410073 China ,grid.412110.70000 0000 9548 2110Laboratory of Software Engineering for Complex System, National University of Defense Technology, Deya Road, Changsha, 410073 China
| |
Collapse
|
118
|
Dindhoria K, Monga I, Thind AS. Computational approaches and challenges for identification and annotation of non-coding RNAs using RNA-Seq. Funct Integr Genomics 2022; 22:1105-1112. [DOI: 10.1007/s10142-022-00915-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 11/04/2022] [Accepted: 11/04/2022] [Indexed: 11/22/2022]
|
119
|
Xu Y, Chen J, Lyu A, Cheung WK, Zhang L. dynDeepDRIM: a dynamic deep learning model to infer direct regulatory interactions using time-course single-cell gene expression data. Brief Bioinform 2022; 23:6720420. [PMID: 36168811 DOI: 10.1093/bib/bbac424] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 08/02/2022] [Accepted: 09/01/2022] [Indexed: 12/14/2022] Open
Abstract
Time-course single-cell RNA sequencing (scRNA-seq) data have been widely used to explore dynamic changes in gene expression of transcription factors (TFs) and their target genes. This information is useful to reconstruct cell-type-specific gene regulatory networks (GRNs). However, the existing tools are commonly designed to analyze either time-course bulk gene expression data or static scRNA-seq data via pseudo-time cell ordering. A few methods successfully utilize the information from multiple time points while also considering the characteristics of scRNA-seq data. We proposed dynDeepDRIM, a novel deep learning model to reconstruct GRNs using time-course scRNA-seq data. It represents the joint expression of a gene pair as an image and utilizes the image of the target TF-gene pair and the ones of the potential neighbors to reconstruct GRNs from time-course scRNA-seq data. dynDeepDRIM can effectively remove the transitive TF-gene interactions by considering neighborhood context and model the gene expression dynamics using high-dimensional tensors. We compared dynDeepDRIM with six GRN reconstruction methods on both simulation and four real time-course scRNA-seq data. dynDeepDRIM achieved substantially better performance than the other methods in inferring TF-gene interactions and eliminated the false positives effectively. We also applied dynDeepDRIM to annotate gene functions and found it achieved evidently better performance than the other tools due to considering the neighbor genes.
Collapse
Affiliation(s)
- Yu Xu
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | - Jiaxing Chen
- Computer Science and Technology, Division of Science and Technology, BNU-HKBU United International College, Jintong Road, 519087, Zhuhai, China
| | - Aiping Lyu
- School of Chinese Medicine, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | - William K Cheung
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| |
Collapse
|
120
|
Ferrari C, Manosalva Pérez N, Vandepoele K. MINI-EX: Integrative inference of single-cell gene regulatory networks in plants. MOLECULAR PLANT 2022; 15:1807-1824. [PMID: 36307979 DOI: 10.1016/j.molp.2022.10.016] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 09/30/2022] [Accepted: 10/21/2022] [Indexed: 05/26/2023]
Abstract
Multicellular organisms, such as plants, are characterized by highly specialized and tightly regulated cell populations, establishing specific morphological structures and executing distinct functions. Gene regulatory networks (GRNs) describe condition-specific interactions of transcription factors (TFs) regulating the expression of target genes, underpinning these specific functions. As efficient and validated methods to identify cell-type-specific GRNs from single-cell data in plants are lacking, limiting our understanding of the organization of specific cell types in both model species and crops, we developed MINI-EX (Motif-Informed Network Inference based on single-cell EXpression data), an integrative approach to infer cell-type-specific networks in plants. MINI-EX uses single-cell transcriptomic data to define expression-based networks and integrates TF motif information to filter the inferred regulons, resulting in networks with increased accuracy. Next, regulons are assigned to different cell types, leveraging cell-specific expression, and candidate regulators are prioritized using network centrality measures, functional annotations, and expression specificity. This embedded prioritization strategy offers a unique and efficient means to unravel signaling cascades in specific cell types controlling a biological process of interest. We demonstrate the stability of MINI-EX toward input data sets with low number of cells and its robustness toward missing data, and show that it infers state-of-the-art networks with a better performance compared with other related single-cell network tools. MINI-EX successfully identifies key regulators controlling root development in Arabidopsis and rice, leaf development in Arabidopsis, and ear development in maize, enhancing our understanding of cell-type-specific regulation and unraveling the roles of different regulators controlling the development of specific cell types in plants.
Collapse
Affiliation(s)
- Camilla Ferrari
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium; Center for Plant Systems Biology, VIB, 9052 Ghent, Belgium
| | - Nicolás Manosalva Pérez
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium; Center for Plant Systems Biology, VIB, 9052 Ghent, Belgium
| | - Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium; Center for Plant Systems Biology, VIB, 9052 Ghent, Belgium; Bioinformatics Institute Ghent, Ghent University, 9052 Ghent, Belgium.
| |
Collapse
|
121
|
Majumder S, Thakran Y, Pal V, Singh K. Fuzzy and Rough Set Theory Based Computational Framework for Mining Genetic Interaction Triplets From Gene Expression Profiles for Lung Adenocarcinoma. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3469-3481. [PMID: 34665736 DOI: 10.1109/tcbb.2021.3120844] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Genetic interactions are very helpful in understanding different disease and discovering drugs for it. Compared to the gene pairs that represent the genetic interactions between two genes, the gene triplets are more informative and useful. However, existing works on genetic interactions among gene triplets have primarily focused on detecting gene triplets from time series gene expression profiles. Generating the time series gene expression profiles for humans is quite impracticable but the labeled gene expression profiles are available for different diseases in case of humans. In this paper, a computational framework has been proposed to detect gene triplets from labeled gene expression profiles. First, it employs Rough Set Theory for extracting the key genes and then designs a fuzzy inference system for generating possible gene triplets. Further, Root Mean Squared Error measure has been used to prune out the irrelevant gene triplets. In the present work, the proposed computational framework has been applied to labeled lung adenocarcinoma dataset and can be applied to any other labeled gene expression dataset. The extracted gene triplets and their functionalities have been verified with existing biological literature and benchmark databases and the results of verification signify that the proposed framework is promising in terms of finding useful genetic triplets. Further, the proposed framework has been found more efficient as compared to an existing mutual information-based technique in terms of detecting known genetic interactions.
Collapse
|
122
|
Jiang J, Lyu P, Li J, Huang S, Tao J, Blackshaw S, Qian J, Wang J. IReNA: Integrated regulatory network analysis of single-cell transcriptomes and chromatin accessibility profiles. iScience 2022; 25:105359. [PMID: 36325073 PMCID: PMC9619378 DOI: 10.1016/j.isci.2022.105359] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 09/19/2022] [Accepted: 10/12/2022] [Indexed: 11/16/2022] Open
Abstract
Recently, single-cell RNA sequencing (scRNA-seq) and single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) have been developed to separately measure transcriptomes and chromatin accessibility profiles at the single-cell resolution. However, few methods can reliably integrate these data to perform regulatory network analysis. Here, we developed integrated regulatory network analysis (IReNA) for network inference through the integrated analysis of scRNA-seq and scATAC-seq data, network modularization, transcription factor enrichment, and construction of simplified intermodular regulatory networks. Using public datasets, we showed that integrated network analysis of scRNA-seq data with scATAC-seq data is more precise to identify known regulators than scRNA-seq data analysis alone. Moreover, IReNA outperformed currently available methods in identifying known regulators. IReNA facilitates the systems-level understanding of biological regulatory mechanisms and is available at https://github.com/jiang-junyao/IReNA.
Collapse
Affiliation(s)
- Junyao Jiang
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Biocomputing, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Pin Lyu
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Jinlian Li
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Biocomputing, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Sunan Huang
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Biocomputing, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Jiawang Tao
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Biocomputing, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Seth Blackshaw
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Jiang Qian
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Jie Wang
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Biocomputing, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
- State Key Laboratory of Respiratory Disease, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
- China-New Zealand Joint Laboratory on Biomedicine and Health, Guangzhou 510530, China
- Corresponding author
| |
Collapse
|
123
|
Bocci F, Zhou P, Nie Q. spliceJAC: transition genes and state-specific gene regulation from single-cell transcriptome data. Mol Syst Biol 2022; 18:e11176. [PMID: 36321549 PMCID: PMC9627675 DOI: 10.15252/msb.202211176] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 10/07/2022] [Accepted: 10/10/2022] [Indexed: 11/25/2022] Open
Abstract
Extracting dynamical information from single-cell transcriptomics is a novel task with the promise to advance our understanding of cell state transition and interactions between genes. Yet, theory-oriented, bottom-up approaches that consider differences among cell states are largely lacking. Here, we present spliceJAC, a method to quantify the multivariate mRNA splicing from single-cell RNA sequencing (scRNA-seq). spliceJAC utilizes the unspliced and spliced mRNA count matrices to constructs cell state-specific gene-gene regulatory interactions and applies stability analysis to predict putative driver genes critical to the transitions between cell states. By applying spliceJAC to biological systems including pancreas endothelium development and epithelial-mesenchymal transition (EMT) in A549 lung cancer cells, we predict genes that serve specific signaling roles in different cell states, recover important differentially expressed genes in agreement with pre-existing analysis, and predict new transition genes that are either exclusive or shared between different cell state transitions.
Collapse
Affiliation(s)
- Federico Bocci
- Department of MathematicsUniversity of CaliforniaIrvineCAUSA
- NSF‐Simons Center for Multiscale Cell Fate ResearchUniversity of CaliforniaIrvineCAUSA
| | - Peijie Zhou
- Department of MathematicsUniversity of CaliforniaIrvineCAUSA
| | - Qing Nie
- Department of MathematicsUniversity of CaliforniaIrvineCAUSA
- NSF‐Simons Center for Multiscale Cell Fate ResearchUniversity of CaliforniaIrvineCAUSA
- Department of Developmental and Cell BiologyUniversity of CaliforniaIrvineCAUSA
| |
Collapse
|
124
|
Chen G, Liu ZP. Graph attention network for link prediction of gene regulations from single-cell RNA-sequencing data. Bioinformatics 2022; 38:4522-4529. [PMID: 35961023 DOI: 10.1093/bioinformatics/btac559] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 07/18/2022] [Accepted: 08/10/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Single-cell RNA sequencing (scRNA-seq) data provides unprecedented opportunities to reconstruct gene regulatory networks (GRNs) at fine-grained resolution. Numerous unsupervised or self-supervised models have been proposed to infer GRN from bulk RNA-seq data, but few of them are appropriate for scRNA-seq data under the circumstance of low signal-to-noise ratio and dropout. Fortunately, the surging of TF-DNA binding data (e.g. ChIP-seq) makes supervised GRN inference possible. We regard supervised GRN inference as a graph-based link prediction problem that expects to learn gene low-dimensional vectorized representations to predict potential regulatory interactions. RESULTS In this paper, we present GENELink to infer latent interactions between transcription factors (TFs) and target genes in GRN using graph attention network. GENELink projects the single-cell gene expression with observed TF-gene pairs to a low-dimensional space. Then, the specific gene representations are learned to serve for downstream similarity measurement or causal inference of pairwise genes by optimizing the embedding space. Compared to eight existing GRN reconstruction methods, GENELink achieves comparable or better performance on seven scRNA-seq datasets with four types of ground-truth networks. We further apply GENELink on scRNA-seq of human breast cancer metastasis and reveal regulatory heterogeneity of Notch and Wnt signalling pathways between primary tumour and lung metastasis. Moreover, the ontology enrichment results of unique lung metastasis GRN indicate that mitochondrial oxidative phosphorylation (OXPHOS) is functionally important during the seeding step of the cancer metastatic cascade, which is validated by pharmacological assays. AVAILABILITY AND IMPLEMENTATION The code and data are available at https://github.com/zpliulab/GENELink. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Guangyi Chen
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| |
Collapse
|
125
|
Subramanian A, Zakeri P, Mousa M, Alnaqbi H, Alshamsi FY, Bettoni L, Damiani E, Alsafar H, Saeys Y, Carmeliet P. Angiogenesis goes computational - The future way forward to discover new angiogenic targets? Comput Struct Biotechnol J 2022; 20:5235-5255. [PMID: 36187917 PMCID: PMC9508490 DOI: 10.1016/j.csbj.2022.09.019] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/09/2022] [Accepted: 09/09/2022] [Indexed: 11/26/2022] Open
Abstract
Multi-omics technologies are being increasingly utilized in angiogenesis research. Yet, computational methods have not been widely used for angiogenic target discovery and prioritization in this field, partly because (wet-lab) vascular biologists are insufficiently familiar with computational biology tools and the opportunities they may offer. With this review, written for vascular biologists who lack expertise in computational methods, we aspire to break boundaries between both fields and to illustrate the potential of these tools for future angiogenic target discovery. We provide a comprehensive survey of currently available computational approaches that may be useful in prioritizing candidate genes, predicting associated mechanisms, and identifying their specificity to endothelial cell subtypes. We specifically highlight tools that use flexible, machine learning frameworks for large-scale data integration and gene prioritization. For each purpose-oriented category of tools, we describe underlying conceptual principles, highlight interesting applications and discuss limitations. Finally, we will discuss challenges and recommend some guidelines which can help to optimize the process of accurate target discovery.
Collapse
Affiliation(s)
- Abhishek Subramanian
- Laboratory of Angiogenesis & Vascular Metabolism, Center for Cancer Biology, VIB, Leuven, Belgium
- Laboratory of Angiogenesis & Vascular Metabolism, Department of Oncology, KU Leuven, Leuven, Belgium
| | - Pooya Zakeri
- Laboratory of Angiogenesis & Vascular Heterogeneity, Department of Biomedicine, Aarhus University, Aarhus, Denmark
- Centre for Brain and Disease Research, Flanders Institute for Biotechnology (VIB), Leuven, Belgium
- Department of Neurosciences and Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Mira Mousa
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Halima Alnaqbi
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Fatima Yousif Alshamsi
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Leo Bettoni
- Laboratory of Angiogenesis & Vascular Metabolism, Center for Cancer Biology, VIB, Leuven, Belgium
- Laboratory of Angiogenesis & Vascular Metabolism, Department of Oncology, KU Leuven, Leuven, Belgium
| | - Ernesto Damiani
- Robotics and Intelligent Systems Institute, Khalifa University, Abu Dhabi, United Arab Emirates
| | - Habiba Alsafar
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Yvan Saeys
- Data Mining and Modelling for Biomedicine Group, VIB Center for Inflammation Research, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Peter Carmeliet
- Laboratory of Angiogenesis & Vascular Metabolism, Center for Cancer Biology, VIB, Leuven, Belgium
- Laboratory of Angiogenesis & Vascular Metabolism, Department of Oncology, KU Leuven, Leuven, Belgium
- Laboratory of Angiogenesis & Vascular Heterogeneity, Department of Biomedicine, Aarhus University, Aarhus, Denmark
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| |
Collapse
|
126
|
Shu H, Ding F, Zhou J, Xue Y, Zhao D, Zeng J, Ma J. Boosting single-cell gene regulatory network reconstruction via bulk-cell transcriptomic data. Brief Bioinform 2022; 23:6693602. [PMID: 36070863 DOI: 10.1093/bib/bbac389] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 08/09/2022] [Accepted: 08/11/2022] [Indexed: 11/12/2022] Open
Abstract
Computational recovery of gene regulatory network (GRN) has recently undergone a great shift from bulk-cell towards designing algorithms targeting single-cell data. In this work, we investigate whether the widely available bulk-cell data could be leveraged to assist the GRN predictions for single cells. We infer cell-type-specific GRNs from both the single-cell RNA sequencing data and the generic GRN derived from the bulk cells by constructing a weakly supervised learning framework based on the axial transformer. We verify our assumption that the bulk-cell transcriptomic data are a valuable resource, which could improve the prediction of single-cell GRN by conducting extensive experiments. Our GRN-transformer achieves the state-of-the-art prediction accuracy in comparison to existing supervised and unsupervised approaches. In addition, we show that our method can identify important transcription factors and potential regulations for Alzheimer's disease risk genes by using the predicted GRN. Availability: The implementation of GRN-transformer is available at https://github.com/HantaoShu/GRN-Transformer.
Collapse
Affiliation(s)
- Hantao Shu
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Fan Ding
- Department of Computer Science, Purdue University, IN 47907, United States
| | - Jingtian Zhou
- Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, United States.,Bioinformatics Program, University of California, San Diego, La Jolla, CA 92093, United States
| | - Yexiang Xue
- Department of Computer Science, Purdue University, IN 47907, United States
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Jianzhu Ma
- Institute for Artificial Intelligence, Peking University, Beijing 100091, China
| |
Collapse
|
127
|
Zheng H, Wang S, Li X, Hu H. INSISTC: Incorporating network structure information for single-cell type classification. Genomics 2022; 114:110480. [PMID: 36075505 DOI: 10.1016/j.ygeno.2022.110480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Revised: 08/30/2022] [Accepted: 09/04/2022] [Indexed: 11/27/2022]
Abstract
Uncovering gene regulatory mechanisms in individual cells can provide insight into cell heterogeneity and function. Recent accumulated Single-Cell RNA-Seq data have made it possible to analyze gene regulation at single-cell resolution. Understanding cell-type-specific gene regulation can assist in more accurate cell type and state identification. Computational approaches utilizing such relationships are under development. Methods pioneering in integrating gene regulatory mechanism discovery with cell-type classification encounter challenges such as determine gene regulatory relationships and incorporate gene regulatory network structure. To fill this gap, we developed INSISTC, a computational method to incorporate gene regulatory network structure information for single-cell type classification. INSISTC is capable of identifying cell-type-specific gene regulatory mechanisms while performing single-cell type classification. INSISTC demonstrated its accuracy in cell type classification and its potential for providing insight into molecular mechanisms specific to individual cells. In comparison with the alternative methods, INSISTC demonstrated its complementary performance for gene regulation interpretation.
Collapse
Affiliation(s)
- Hansi Zheng
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Saidi Wang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Xiaoman Li
- Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, FL 32816, USA.
| | - Haiyan Hu
- Department of Computer Science, Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, USA.
| |
Collapse
|
128
|
Zhao X, Lan Y, Chen D. Exploring long non-coding RNA networks from single cell omics data. Comput Struct Biotechnol J 2022; 20:4381-4389. [PMID: 36051880 PMCID: PMC9403499 DOI: 10.1016/j.csbj.2022.08.003] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 08/01/2022] [Accepted: 08/01/2022] [Indexed: 11/03/2022] Open
|
129
|
Bulbul Ahmed M, Humayan Kabir A. Understanding of the various aspects of gene regulatory networks related to crop improvement. Gene 2022; 833:146556. [PMID: 35609798 DOI: 10.1016/j.gene.2022.146556] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 03/14/2022] [Accepted: 05/06/2022] [Indexed: 12/30/2022]
Abstract
The hierarchical relationship between transcription factors, associated proteins, and their target genes is defined by a gene regulatory network (GRN). GRNs allow us to understand how the genotype and environment of a plant are incorporated to control the downstream physiological responses. During plant growth or environmental acclimatization, GRNs are diverse and can be differently regulated across tissue types and organs. An overview of recent advances in the development of GRN that speed up basic and applied plant research is given here. Furthermore, the overview of genome and transcriptome involving GRN research along with the exciting advancement and application are discussed. In addition, different approaches to GRN predictions were elucidated. In this review, we also describe the role of GRN in crop improvement, crop plant manipulation, stress responses, speed breeding and identifying genetic variations/locus. Finally, the challenges and prospects of GRN in plant biology are discussed.
Collapse
Affiliation(s)
- Md Bulbul Ahmed
- Plant Science Department, McGill University, 21111 lakeshore Road, Ste. Anne de Bellevue H9X3V9, Quebec, Canada; Institut de Recherche en Biologie Végétale (IRBV), University of Montreal, Montréal, Québec H1X 2B2, Canada.
| | | |
Collapse
|
130
|
Dinh K, Wang Q. A probabilistic Boolean model on hair follicle cell fate regulation by TGF-β. Biophys J 2022; 121:2638-2652. [PMID: 35714600 PMCID: PMC9300639 DOI: 10.1016/j.bpj.2022.05.035] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 05/20/2022] [Accepted: 05/23/2022] [Indexed: 11/24/2022] Open
Abstract
Hair follicles (HFs) are mini skin organs that undergo cyclic growth. Various signals regulate HF cell fate decisions jointly. Recent experimental results suggest that transforming growth factor beta (TGF-β) exhibits a dual role in HF cell fate regulation that can be either anti- or pro-apoptosis. To understand the underlying mechanisms of HF cell fate control, we develop a novel probabilistic Boolean network (pBN) model on the HF epithelial cell gene regulation dynamics. First, the model is derived from literature, then refined using single-cell RNA sequencing data. Using the model, we both explore the mechanisms underlying HF cell fate decisions and make predictions that could potentially guide future experiments: 1) we propose that a threshold-like switch in the TGF-β strength may necessitate the dual roles of TGF-β in either activating apoptosis or cell proliferation, in cooperation with bone morphogenetic protein (BMP) and tumor necrosis factor (TNF) and at different stages of a follicle growth cycle; 2) our model shows concordance with the high-activator-low-inhibitor theory of anagen initiation; 3) we predict that TNF may be more effective in catagen initiation than TGF-β, and they may cooperate in a two-step fashion; 4) finally, predictions of gene knockout and overexpression reveal the roles in HF cell fate regulations of each gene. Attractor and motif analysis from the associated Boolean networks reveal the relations between the topological structure of the gene regulation network and the cell fate regulation mechanism. A discrete spatial model equipped with the pBN illustrates how TGF-β and TNF cooperate in initiating and driving the apoptosis wave during catagen.
Collapse
Affiliation(s)
- Katherine Dinh
- Department of Biology, University of California, Riverside, California
| | - Qixuan Wang
- Department of Mathematics, University of California, Riverside, California; Interdisciplinary Center for Quantitative Modeling in Biology, University of California, Riverside, California.
| |
Collapse
|
131
|
Ellis D, Wu D, Datta S. SAREV: A review on statistical analytics of single-cell RNA sequencing data. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL STATISTICS 2022; 14:e1558. [PMID: 36034329 PMCID: PMC9400796 DOI: 10.1002/wics.1558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 04/09/2021] [Indexed: 06/15/2023]
Abstract
Due to the development of next-generation RNA sequencing (NGS) technologies, there has been tremendous progress in research involving determining the role of genomics, transcriptomics and epigenomics in complex biological systems. However, scientists have realized that information obtained using earlier technology, frequently called 'bulk RNA-seq' data, provides information averaged across all the cells present in a tissue. Relatively newly developed single cell (scRNA-seq) technology allows us to provide transcriptomic information at a single-cell resolution. Nevertheless, these high-resolution data have their own complex natures and demand novel statistical data analysis methods to provide effective and highly accurate results on complex biological systems. In this review, we cover many such recently developed statistical methods for researchers wanting to pursue scRNA-seq statistical and computational research as well as scientific research about these existing methods and free software tools available for their generated data. This review is certainly not exhaustive due to page limitations. We have tried to cover the popular methods starting from quality control to the downstream analysis of finding differentially expressed genes and concluding with a brief description of network analysis.
Collapse
Affiliation(s)
- Dorothy Ellis
- Department of Biostatistics, University of Florida, School of Public Health and Health Professions, Gainesville, FL
| | - Dongyuan Wu
- Department of Biostatistics, University of Florida, School of Public Health and Health Professions, Gainesville, FL
| | - Susmita Datta
- Department of Biostatistics, University of Florida, School of Public Health and Health Professions, Gainesville, FL
| |
Collapse
|
132
|
Yang B, Bao W, Chen B, Song D. Single_cell_GRN: gene regulatory network identification based on supervised learning method and Single-cell RNA-seq data. BioData Min 2022; 15:13. [PMID: 35690842 PMCID: PMC9188720 DOI: 10.1186/s13040-022-00297-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 05/22/2022] [Indexed: 11/30/2022] Open
Abstract
Single-cell RNA-seq overcomes the shortcomings of conventional transcriptome sequencing technology and could provide a powerful tool for distinguishing the transcriptome characteristics of various cell types in biological tissues, and comprehensively revealing the heterogeneity of gene expression between cells. Many Intelligent Computing methods have been presented to infer gene regulatory network (GRN) with single-cell RNA-seq data. In this paper, we investigate the performances of seven classifiers including support vector machine (SVM), random forest (RF), Naive Bayesian (NB), GBDT, logical regression (LR), decision tree (DT) and K-Nearest Neighbor (KNN) for solving the binary classification problems of GRN inference with single-cell RNA-seq data (Single_cell_GRN). In SVM, three different kernel functions (linear, polynomial and radial basis function) are utilized, respectively. Three real single-cell RNA-seq datasets from mouse and human are utilized. The experiment results prove that in most cases supervised learning methods (SVM, RF, NB, GBDT, LR, DT and KNN) perform better than unsupervised learning method (GENIE3) in terms of AUC. SVM, RF and KNN have the better performances than other four classifiers. In SVM, linear and polynomial kernels are more fit to model single-cell RNA-seq data.
Collapse
Affiliation(s)
- Bin Yang
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277160, China
| | - Wenzheng Bao
- School of Information and Electrical Engineering, Xuzhou University of Technology, Xuzhou, 221018, China.
| | - Baitong Chen
- Xuzhou First People's Hospital, Xuzhou, 221000, China
| | - Dan Song
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277160, China.
| |
Collapse
|
133
|
Gan Y, Hu X, Zou G, Yan C, Xu G. Inferring Gene Regulatory Networks From Single-Cell Transcriptomic Data Using Bidirectional RNN. Front Oncol 2022; 12:899825. [PMID: 35692809 PMCID: PMC9178250 DOI: 10.3389/fonc.2022.899825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Accepted: 04/22/2022] [Indexed: 11/30/2022] Open
Abstract
Accurate inference of gene regulatory rules is critical to understanding cellular processes. Existing computational methods usually decompose the inference of gene regulatory networks (GRNs) into multiple subproblems, rather than detecting potential causal relationships simultaneously, which limits the application to data with a small number of genes. Here, we propose BiRGRN, a novel computational algorithm for inferring GRNs from time-series single-cell RNA-seq (scRNA-seq) data. BiRGRN utilizes a bidirectional recurrent neural network to infer GRNs. The recurrent neural network is a complex deep neural network that can capture complex, non-linear, and dynamic relationships among variables. It maps neurons to genes, and maps the connections between neural network layers to the regulatory relationship between genes, providing an intuitive solution to model GRNs with biological closeness and mathematical flexibility. Based on the deep network, we transform the inference of GRNs into a regression problem, using the gene expression data at previous time points to predict the gene expression data at the later time point. Furthermore, we adopt two strategies to improve the accuracy and stability of the algorithm. Specifically, we utilize a bidirectional structure to integrate the forward and reverse inference results and exploit an incomplete set of prior knowledge to filter out some candidate inferences of low confidence. BiRGRN is applied to four simulated datasets and three real scRNA-seq datasets to verify the proposed method. We perform comprehensive comparisons between our proposed method with other state-of-the-art techniques. These experimental results indicate that BiRGRN is capable of inferring GRN simultaneously from time-series scRNA-seq data. Our method BiRGRN is implemented in Python using the TensorFlow machine-learning library, and it is freely available at https://gitee.com/DHUDBLab/bi-rgrn.
Collapse
Affiliation(s)
- Yanglan Gan
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Xin Hu
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Guobing Zou
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| | - Cairong Yan
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Guangwei Xu
- School of Computer Science and Technology, Donghua University, Shanghai, China
| |
Collapse
|
134
|
Sonawane AR, Aikawa E, Aikawa M. Connections for Matters of the Heart: Network Medicine in Cardiovascular Diseases. Front Cardiovasc Med 2022; 9:873582. [PMID: 35665246 PMCID: PMC9160390 DOI: 10.3389/fcvm.2022.873582] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 04/19/2022] [Indexed: 01/18/2023] Open
Abstract
Cardiovascular diseases (CVD) are diverse disorders affecting the heart and vasculature in millions of people worldwide. Like other fields, CVD research has benefitted from the deluge of multiomics biomedical data. Current CVD research focuses on disease etiologies and mechanisms, identifying disease biomarkers, developing appropriate therapies and drugs, and stratifying patients into correct disease endotypes. Systems biology offers an alternative to traditional reductionist approaches and provides impetus for a comprehensive outlook toward diseases. As a focus area, network medicine specifically aids the translational aspect of in silico research. This review discusses the approach of network medicine and its application to CVD research.
Collapse
Affiliation(s)
- Abhijeet Rajendra Sonawane
- Center for Interdisciplinary Cardiovascular Sciences, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
- Center for Excellence in Vascular Biology, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
| | - Elena Aikawa
- Center for Interdisciplinary Cardiovascular Sciences, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
- Center for Excellence in Vascular Biology, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
| | - Masanori Aikawa
- Center for Interdisciplinary Cardiovascular Sciences, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
- Center for Excellence in Vascular Biology, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
135
|
Caligola S, De Sanctis F, Canè S, Ugel S. Breaking the Immune Complexity of the Tumor Microenvironment Using Single-Cell Technologies. Front Genet 2022; 13:867880. [PMID: 35651929 PMCID: PMC9149246 DOI: 10.3389/fgene.2022.867880] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 04/27/2022] [Indexed: 12/31/2022] Open
Abstract
Tumors are not a simple aggregate of transformed cells but rather a complicated ecosystem containing various components, including infiltrating immune cells, tumor-related stromal cells, endothelial cells, soluble factors, and extracellular matrix proteins. Profiling the immune contexture of this intricate framework is now mandatory to develop more effective cancer therapies and precise immunotherapeutic approaches by identifying exact targets or predictive biomarkers, respectively. Conventional technologies are limited in reaching this goal because they lack high resolution. Recent developments in single-cell technologies, such as single-cell RNA transcriptomics, mass cytometry, and multiparameter immunofluorescence, have revolutionized the cancer immunology field, capturing the heterogeneity of tumor-infiltrating immune cells and the dynamic complexity of tenets that regulate cell networks in the tumor microenvironment. In this review, we describe some of the current single-cell technologies and computational techniques applied for immune-profiling the cancer landscape and discuss future directions of how integrating multi-omics data can guide a new "precision oncology" advancement.
Collapse
Affiliation(s)
| | | | | | - Stefano Ugel
- Immunology Section, Department of Medicine, University of Verona, Verona, Italy
| |
Collapse
|
136
|
Inference of Molecular Regulatory Systems Using Statistical Path-Consistency Algorithm. ENTROPY 2022; 24:e24050693. [PMID: 35626576 PMCID: PMC9142129 DOI: 10.3390/e24050693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 05/12/2022] [Accepted: 05/12/2022] [Indexed: 11/16/2022]
Abstract
One of the key challenges in systems biology and molecular sciences is how to infer regulatory relationships between genes and proteins using high-throughout omics datasets. Although a wide range of methods have been designed to reverse engineer the regulatory networks, recent studies show that the inferred network may depend on the variable order in the dataset. In this work, we develop a new algorithm, called the statistical path-consistency algorithm (SPCA), to solve the problem of the dependence of variable order. This method generates a number of different variable orders using random samples, and then infers a network by using the path-consistent algorithm based on each variable order. We propose measures to determine the edge weights using the corresponding edge weights in the inferred networks, and choose the edges with the largest weights as the putative regulations between genes or proteins. The developed method is rigorously assessed by the six benchmark networks in DREAM challenges, the mitogen-activated protein (MAP) kinase pathway, and a cancer-specific gene regulatory network. The inferred networks are compared with those obtained by using two up-to-date inference methods. The accuracy of the inferred networks shows that the developed method is effective for discovering molecular regulatory systems.
Collapse
|
137
|
Hang Y, Burns J, Shealy BT, Pauly R, Ficklin SP, Feltus FA. Identification of condition-specific regulatory mechanisms in normal and cancerous human lung tissue. BMC Genomics 2022; 23:350. [PMID: 35524179 PMCID: PMC9077899 DOI: 10.1186/s12864-022-08591-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 04/25/2022] [Indexed: 12/24/2022] Open
Abstract
Background Lung cancer is the leading cause of cancer death in both men and women. The most common lung cancer subtype is non-small cell lung carcinoma (NSCLC) comprising about 85% of all cases. NSCLC can be further divided into three subtypes: adenocarcinoma (LUAD), squamous cell carcinoma (LUSC), and large cell lung carcinoma. Specific genetic mutations and epigenetic aberrations play an important role in the developmental transition to a specific tumor subtype. The elucidation of normal lung versus lung tumor gene expression patterns and regulatory targets yields biomarker systems that discriminate lung phenotypes (i.e., biomarkers) and provide a foundation for the discovery of normal and aberrant gene regulatory mechanisms. Results We built condition-specific gene co-expression networks (csGCNs) for normal lung, LUAD, and LUSC conditions. Then, we integrated normal lung tissue-specific gene regulatory networks (tsGRNs) to elucidate control-target biomarker systems for normal and cancerous lung tissue. We characterized co-expressed gene edges, possibly under common regulatory control, for relevance in lung cancer. Conclusions Our approach demonstrates the ability to elucidate csGCN:tsGRN merged biomarker systems based on gene expression correlation and regulation. The biomarker systems we describe can be used to classify and further describe lung specimens. Our approach is generalizable and can be used to discover and interpret complex gene expression patterns for any condition or species. Supplementary Information The online version contains available at 10.1186/s12864-022-08591-9.
Collapse
Affiliation(s)
- Yuqing Hang
- Department of Genetics & Biochemistry, Clemson University, Clemson, 29634, USA
| | - Josh Burns
- Department of Horticulture, Washington State University, Pullman, 99164, USA
| | - Benjamin T Shealy
- Department of Electrical and Computer Engineering, Clemson University, Clemson, 29634, USA
| | - Rini Pauly
- Biomedical Data Science and Informatics Program, Clemson University, Clemson, 29634, USA
| | - Stephen P Ficklin
- Department of Horticulture, Washington State University, Pullman, 99164, USA
| | - Frank A Feltus
- Department of Genetics & Biochemistry, Clemson University, Clemson, 29634, USA. .,Biomedical Data Science and Informatics Program, Clemson University, Clemson, 29634, USA. .,Center for Human Genetics, Clemson University, Clemson, 29634, USA. .,Biosystems Research Complex, 302C, 105 Collings St, Clemson, SC, 29634, USA.
| |
Collapse
|
138
|
Liu R, Pisco AO, Braun E, Linnarsson S, Zou J. Dynamical Systems Model of RNA Velocity Improves Inference of Single-cell Trajectory, Pseudo-time and Gene Regulation. J Mol Biol 2022; 434:167606. [PMID: 35489382 DOI: 10.1016/j.jmb.2022.167606] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Revised: 04/19/2022] [Accepted: 04/21/2022] [Indexed: 11/24/2022]
Abstract
Recent development in inferring RNA velocity from single-cell RNA-seq opens up exciting new vista into developmental lineage and cellular dynamics. However, the estimated velocity only gives a snapshot of how the transcriptome instantaneously changes in individual cells, and it does not provide quantitative predictions and insights about the whole system. In this work, we develop RNA-ODE, a principled computational framework that extends RNA velocity to quantify systems level dynamics and improve single-cell data analysis. We model the gene expression dynamics by an ordinary differential equation (ODE) based formalism. Given a snapshot of gene expression at one time, RNA-ODE is able to predict and extrapolate the expression trajectory of each cell by solving the dynamic equations. Systematic experiments on simulations and on new data from developing brain demonstrate that RNA-ODE substantially improves many aspects of standard single-cell analysis. By leveraging temporal dynamics, RNA-ODE more accurately estimates cell state lineage and pseudo-time compared to previous state-of-the-art methods. It also infers gene regulatory networks and identifies influential genes whose expression changes can decide cell fate. We expect RNA-ODE to be a Swiss army knife that aids many facets of single-cell RNA-seq analysis.
Collapse
Affiliation(s)
- Ruishan Liu
- Department of Electrical Engineering, Stanford University, Stanford, CA, USA
| | | | | | | | - James Zou
- Department of Electrical Engineering, Stanford University, Stanford, CA, USA; Chan-Zuckerberg Biohub, San Francisco, CA, USA; Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
| |
Collapse
|
139
|
SimiC enables the inference of complex gene regulatory dynamics across cell phenotypes. Commun Biol 2022; 5:351. [PMID: 35414121 PMCID: PMC9005655 DOI: 10.1038/s42003-022-03319-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Accepted: 03/24/2022] [Indexed: 11/08/2022] Open
Abstract
Single-cell RNA-Sequencing has the potential to provide deep biological insights by revealing complex regulatory interactions across diverse cell phenotypes at single-cell resolution. However, current single-cell gene regulatory network inference methods produce a single regulatory network per input dataset, limiting their capability to uncover complex regulatory relationships across related cell phenotypes. We present SimiC, a single-cell gene regulatory inference framework that overcomes this limitation by jointly inferring distinct, but related, gene regulatory dynamics per phenotype. We show that SimiC uncovers key regulatory dynamics missed by previously proposed methods across a range of systems, both model and non-model alike. In particular, SimiC was able to uncover CAR T cell dynamics after tumor recognition and key regulatory patterns on a regenerating liver, and was able to implicate glial cells in the generation of distinct behavioral states in honeybees. SimiC hence establishes a new approach to quantitating regulatory architectures between distinct cellular phenotypes, with far-reaching implications for systems biology.
Collapse
|
140
|
Zhang Y, He Y, Chen Q, Yang Y, Gong M. Fusion prior gene network for high reliable single-cell gene regulatory network inference. Comput Biol Med 2022; 143:105279. [PMID: 35134605 DOI: 10.1016/j.compbiomed.2022.105279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 01/25/2022] [Accepted: 01/29/2022] [Indexed: 11/03/2022]
Abstract
Single-Cell RNA sequencing technology provides an opportunity to discover gene regulatory networks(GRN) that control cell differentiation and drive cell type transformation. However, it is faced with the challenge of high loss and high noise of sequencing data and contains many pseudo-connections. To solve these problems, we propose a framework called Fusion prior gene network for Gene Regulatory Network inference Accuracy Enhancement(FGRNAE) to infer a high reliable gene regulatory network. Specifically, based on the Single-Cell RNA-sequencing Network Propagation and network Fusion(scNPF) preprocessing framework, we employ the Random Walk with Restart on the prior gene network to interpolate the missing data. Furthermore, we infer the network using the Random Forest algorithm with the results achieved above. In addition, we apply data from the Co-Function Network to build a meta-gene network and select the regulatory connection with the Markov Random Field. Extensive experiments based on datasets from BEELINE validate the effectiveness of our framework for improving the accuracy of inference.
Collapse
Affiliation(s)
- Yongqing Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China; School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Yuchen He
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Qingyuan Chen
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Yihan Yang
- International College, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Meiqin Gong
- West China Second University Hospital, Sichuan University, Chengdu, 610041, China.
| |
Collapse
|
141
|
Wang M, Song WM, Ming C, Wang Q, Zhou X, Xu P, Krek A, Yoon Y, Ho L, Orr ME, Yuan GC, Zhang B. Guidelines for bioinformatics of single-cell sequencing data analysis in Alzheimer's disease: review, recommendation, implementation and application. Mol Neurodegener 2022; 17:17. [PMID: 35236372 PMCID: PMC8889402 DOI: 10.1186/s13024-022-00517-z] [Citation(s) in RCA: 56] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 01/18/2022] [Indexed: 12/13/2022] Open
Abstract
Alzheimer's disease (AD) is the most common form of dementia, characterized by progressive cognitive impairment and neurodegeneration. Extensive clinical and genomic studies have revealed biomarkers, risk factors, pathways, and targets of AD in the past decade. However, the exact molecular basis of AD development and progression remains elusive. The emerging single-cell sequencing technology can potentially provide cell-level insights into the disease. Here we systematically review the state-of-the-art bioinformatics approaches to analyze single-cell sequencing data and their applications to AD in 14 major directions, including 1) quality control and normalization, 2) dimension reduction and feature extraction, 3) cell clustering analysis, 4) cell type inference and annotation, 5) differential expression, 6) trajectory inference, 7) copy number variation analysis, 8) integration of single-cell multi-omics, 9) epigenomic analysis, 10) gene network inference, 11) prioritization of cell subpopulations, 12) integrative analysis of human and mouse sc-RNA-seq data, 13) spatial transcriptomics, and 14) comparison of single cell AD mouse model studies and single cell human AD studies. We also address challenges in using human postmortem and mouse tissues and outline future developments in single cell sequencing data analysis. Importantly, we have implemented our recommended workflow for each major analytic direction and applied them to a large single nucleus RNA-sequencing (snRNA-seq) dataset in AD. Key analytic results are reported while the scripts and the data are shared with the research community through GitHub. In summary, this comprehensive review provides insights into various approaches to analyze single cell sequencing data and offers specific guidelines for study design and a variety of analytic directions. The review and the accompanied software tools will serve as a valuable resource for studying cellular and molecular mechanisms of AD, other diseases, or biological systems at the single cell level.
Collapse
Affiliation(s)
- Minghui Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Won-min Song
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Chen Ming
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Qian Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Xianxiao Zhou
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Peng Xu
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Azra Krek
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029 USA
| | - Yonejung Yoon
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Lap Ho
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Miranda E. Orr
- Department of Internal Medicine, Section of Gerontology and Geriatric Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina USA
- Sticht Center for Healthy Aging and Alzheimer’s Prevention, Wake Forest School of Medicine, Winston-Salem, North Carolina USA
| | - Guo-Cheng Yuan
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029 USA
| | - Bin Zhang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| |
Collapse
|
142
|
Single-Cell Transcriptome and Network Analyses Unveil Key Transcription Factors Regulating Mesophyll Cell Development in Maize. Genes (Basel) 2022; 13:genes13020374. [PMID: 35205426 PMCID: PMC8872562 DOI: 10.3390/genes13020374] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 02/14/2022] [Accepted: 02/17/2022] [Indexed: 12/17/2022] Open
Abstract
Background: Maize mesophyll (M) cells play important roles in various biological processes such as photosynthesis II and secondary metabolism. Functional differentiation occurs during M-cell development, but the underlying mechanisms for regulating M-cell development are largely unknown. Results: We conducted single-cell RNA sequencing (scRNA-seq) to profile transcripts in maize leaves. We then identified coregulated modules by analyzing the resulting pseudo-time-series data through gene regulatory network analyses. WRKY, ERF, NAC, MYB and Heat stress transcription factor (HSF) families were highly expressed in the early stage, whereas CONSTANS (CO)-like (COL) and ERF families were highly expressed in the late stage of M-cell development. Construction of regulatory networks revealed that these transcript factor (TF) families, especially HSF and COL, were the major players in the early and later stages of M-cell development, respectively. Integration of scRNA expression matrix with TF ChIP-seq and Hi-C further revealed regulatory interactions between these TFs and their targets. HSF1 and COL8 were primarily expressed in the leaf bases and tips, respectively, and their targets were validated with protoplast-based ChIP-qPCR, with the binding sites of HSF1 being experimentally confirmed. Conclusions: Our study provides evidence that several TF families, with the involvement of epigenetic regulation, play vital roles in the regulation of M-cell development in maize.
Collapse
|
143
|
Su EY, Spangler A, Bian Q, Kasamoto JY, Cahan P. Reconstruction of dynamic regulatory networks reveals signaling-induced topology changes associated with germ layer specification. Stem Cell Reports 2022; 17:427-442. [PMID: 35090587 PMCID: PMC8828556 DOI: 10.1016/j.stemcr.2021.12.018] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Revised: 12/21/2021] [Accepted: 12/26/2021] [Indexed: 11/17/2022] Open
Abstract
Elucidating regulatory relationships between transcription factors (TFs) and target genes is fundamental to understanding how cells control their identity and behavior. Unfortunately, existing computational gene regulatory network (GRN) reconstruction methods are imprecise, computationally burdensome, and fail to reveal dynamic regulatory topologies. Here, we present Epoch, a reconstruction tool that uses single-cell transcriptomics to accurately infer dynamic networks. We apply Epoch to identify the dynamic networks underpinning directed differentiation of mouse embryonic stem cells (ESCs) guided by multiple signaling pathways, and we demonstrate that modulating these pathways drives topological changes that bias cell fate potential. We also find that Peg3 rewires the pluripotency network to favor mesoderm specification. By integrating signaling pathways with GRNs, we trace how Wnt activation and PI3K suppression govern mesoderm and endoderm specification, respectively. Finally, we identify regulatory circuits of patterning and axis formation that distinguish in vitro and in vivo mesoderm specification.
Collapse
Affiliation(s)
- Emily Y Su
- Institute for Cell Engineering, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA; Department of Biomedical Engineering, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA
| | - Abby Spangler
- Institute for Cell Engineering, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA
| | - Qin Bian
- Institute for Cell Engineering, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA; Department of Biomedical Engineering, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA
| | - Jessica Y Kasamoto
- Department of Biomedical Engineering, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA
| | - Patrick Cahan
- Institute for Cell Engineering, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA; Department of Biomedical Engineering, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA; Department of Molecular Biology and Genetics, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA.
| |
Collapse
|
144
|
Deshpande A, Chu LF, Stewart R, Gitter A. Network inference with Granger causality ensembles on single-cell transcriptomics. Cell Rep 2022; 38:110333. [PMID: 35139376 PMCID: PMC9093087 DOI: 10.1016/j.celrep.2022.110333] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 02/19/2021] [Accepted: 01/12/2022] [Indexed: 12/20/2022] Open
Abstract
Cellular gene expression changes throughout a dynamic biological process, such as differentiation. Pseudotimes estimate cells' progress along a dynamic process based on their individual gene expression states. Ordering the expression data by pseudotime provides information about the underlying regulator-gene interactions. Because the pseudotime distribution is not uniform, many standard mathematical methods are inapplicable for analyzing the ordered gene expression states. Here we present single-cell inference of networks using Granger ensembles (SINGE), an algorithm for gene regulatory network inference from ordered single-cell gene expression data. SINGE uses kernel-based Granger causality regression to smooth irregular pseudotimes and missing expression values. It aggregates predictions from an ensemble of regression analyses to compile a ranked list of candidate interactions between transcriptional regulators and target genes. In two mouse embryonic stem cell differentiation datasets, SINGE outperforms other contemporary algorithms. However, a more detailed examination reveals caveats about poor performance for individual regulators and uninformative pseudotimes.
Collapse
Affiliation(s)
- Atul Deshpande
- Department of Electrical and Computer Engineering, University of Wisconsin - Madison, Madison, WI 53706, USA; Morgridge Institute for Research, Madison, WI 53715, USA
| | - Li-Fang Chu
- Morgridge Institute for Research, Madison, WI 53715, USA
| | - Ron Stewart
- Morgridge Institute for Research, Madison, WI 53715, USA
| | - Anthony Gitter
- Morgridge Institute for Research, Madison, WI 53715, USA; Department of Biostatistics and Medical Informatics, University of Wisconsin - Madison, Madison, WI 53792, USA.
| |
Collapse
|
145
|
Sekula M, Gaskins J, Datta S. Single-Cell Differential Network Analysis with Sparse Bayesian Factor Models. Front Genet 2022; 12:810816. [PMID: 35186014 PMCID: PMC8855158 DOI: 10.3389/fgene.2021.810816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Accepted: 12/21/2021] [Indexed: 11/13/2022] Open
Abstract
Differential network analysis plays an important role in learning how gene interactions change under different biological conditions, and the high resolution of single-cell RNA (scRNA-seq) sequencing provides new opportunities to explore these changing gene-gene interactions. Here, we present a sparse hierarchical Bayesian factor model to identify differences across network structures from different biological conditions in scRNA-seq data. Our methodology utilizes latent factors to impact gene expression values for each cell to help account for zero-inflation, increased cell-to-cell variability, and overdispersion that are unique characteristics of scRNA-seq data. Condition-dependent parameters determine which latent factors are activated in a gene, which allows for not only the calculation of gene-gene co-expression within each group but also the calculation of the co-expression differences between groups. We highlight our methodology’s performance in detecting differential gene-gene associations across groups by analyzing simulated datasets and a SARS-CoV-2 case study dataset.
Collapse
Affiliation(s)
- Michael Sekula
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY, United States
| | - Jeremy Gaskins
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY, United States
| | - Susmita Datta
- Department of Biostatistics, University of Florida, Gainesville, FL, United States
- *Correspondence: Susmita Datta,
| |
Collapse
|
146
|
Clark NM, Elmore JM, Walley JW. To the proteome and beyond: advances in single-cell omics profiling for plant systems. PLANT PHYSIOLOGY 2022; 188:726-737. [PMID: 35235661 PMCID: PMC8825333 DOI: 10.1093/plphys/kiab429] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Accepted: 08/16/2021] [Indexed: 05/19/2023]
Abstract
Recent advances in single-cell proteomics for animal systems could be adapted for plants to increase our understanding of plant development, response to stimuli, and cell-to-cell signaling.
Collapse
Affiliation(s)
- Natalie M Clark
- Department of Plant Pathology and Microbiology, Iowa State University, Ames, Iowa 50011, USA
| | - James Mitch Elmore
- Department of Plant Pathology and Microbiology, Iowa State University, Ames, Iowa 50011, USA
| | - Justin W Walley
- Department of Plant Pathology and Microbiology, Iowa State University, Ames, Iowa 50011, USA
| |
Collapse
|
147
|
Swift J, Greenham K, Ecker JR, Coruzzi GM, McClung CR. The biology of time: dynamic responses of cell types to developmental, circadian and environmental cues. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 109:764-778. [PMID: 34797944 PMCID: PMC9215356 DOI: 10.1111/tpj.15589] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 11/10/2021] [Accepted: 11/15/2021] [Indexed: 05/26/2023]
Abstract
As sessile organisms, plants are finely tuned to respond dynamically to developmental, circadian and environmental cues. Genome-wide studies investigating these types of cues have uncovered the intrinsically different ways they can impact gene expression over time. Recent advances in single-cell sequencing and time-based bioinformatic algorithms are now beginning to reveal the dynamics of these time-based responses within individual cells and plant tissues. Here, we review what these techniques have revealed about the spatiotemporal nature of gene regulation, paying particular attention to the three distinct ways in which plant tissues are time sensitive. (i) First, we discuss how studying plant cell identity can reveal developmental trajectories hidden in pseudotime. (ii) Next, we present evidence that indicates that plant cell types keep their own local time through tissue-specific regulation of the circadian clock. (iii) Finally, we review what determines the speed of environmental signaling responses, and how they can be contingent on developmental and circadian time. By these means, this review sheds light on how these different scales of time-based responses can act with tissue and cell-type specificity to elicit changes in whole plant systems.
Collapse
Affiliation(s)
- Joseph Swift
- Plant Biology Laboratory, The Salk Institute for Biological Studies, 10010 N Torrey Pines Rd, La Jolla, CA 92037, USA
| | - Kathleen Greenham
- Department of Plant and Microbial Biology, University of Minnesota, St Paul, MN 55108, USA
| | - Joseph R. Ecker
- Plant Biology Laboratory, The Salk Institute for Biological Studies, 10010 N Torrey Pines Rd, La Jolla, CA 92037, USA
- Howard Hughes Medical Institute, The Salk Institute for Biological Studies, 10010 N Torrey Pines Rd, La Jolla, CA 92037, USA
| | - Gloria M. Coruzzi
- Department of Biology, Center for Genomics and Systems Biology, New York University, NY, USA
| | | |
Collapse
|
148
|
Zhao M, He W, Tang J, Zou Q, Guo F. A hybrid deep learning framework for gene regulatory network inference from single-cell transcriptomic data. Brief Bioinform 2022; 23:6513730. [DOI: 10.1093/bib/bbab568] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 12/09/2021] [Accepted: 12/11/2021] [Indexed: 12/21/2022] Open
Abstract
Abstract
Inferring gene regulatory networks (GRNs) based on gene expression profiles is able to provide an insight into a number of cellular phenotypes from the genomic level and reveal the essential laws underlying various life phenomena. Different from the bulk expression data, single-cell transcriptomic data embody cell-to-cell variance and diverse biological information, such as tissue characteristics, transformation of cell types, etc. Inferring GRNs based on such data offers unprecedented advantages for making a profound study of cell phenotypes, revealing gene functions and exploring potential interactions. However, the high sparsity, noise and dropout events of single-cell transcriptomic data pose new challenges for regulation identification. We develop a hybrid deep learning framework for GRN inference from single-cell transcriptomic data, DGRNS, which encodes the raw data and fuses recurrent neural network and convolutional neural network (CNN) to train a model capable of distinguishing related gene pairs from unrelated gene pairs. To overcome the limitations of such datasets, it applies sliding windows to extract valuable features while preserving the direction of regulation. DGRNS is constructed as a deep learning model containing gated recurrent unit network for exploring time-dependent information and CNN for learning spatially related information. Our comprehensive and detailed comparative analysis on the dataset of mouse hematopoietic stem cells illustrates that DGRNS outperforms state-of-the-art methods. The networks inferred by DGRNS are about 16% higher than the area under the receiver operating characteristic curve of other unsupervised methods and 10% higher than the area under the precision recall curve of other supervised methods. Experiments on human datasets show the strong robustness and excellent generalization of DGRNS. By comparing the predictions with standard network, we discover a series of novel interactions which are proved to be true in some specific cell types. Importantly, DGRNS identifies a series of regulatory relationships with high confidence and functional consistency, which have not yet been experimentally confirmed and merit further research.
Collapse
|
149
|
Shrivastava H, Zhang X, Song L, Aluru S. GRNUlar: A Deep Learning Framework for Recovering Single-Cell Gene Regulatory Networks. J Comput Biol 2022; 29:27-44. [PMID: 35050715 DOI: 10.1089/cmb.2021.0437] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
We propose GRNUlar, a novel deep learning framework for supervised learning of gene regulatory networks (GRNs) from single-cell RNA-Sequencing (scRNA-Seq) data. Our framework incorporates two intertwined models. First, we leverage the expressive ability of neural networks to capture complex dependencies between transcription factors and the corresponding genes they regulate, by developing a multitask learning framework. Second, to capture sparsity of GRNs observed in the real world, we design an unrolled algorithm technique for our framework. Our deep architecture requires supervision for training, for which we repurpose existing synthetic data simulators that generate scRNA-Seq data guided by an underlying GRN. Experimental results demonstrate that GRNUlar outperforms state-of-the-art methods on both synthetic and real data sets. Our study also demonstrates the novel and successful use of expression data simulators for supervised learning of GRN inference.
Collapse
Affiliation(s)
- Harsh Shrivastava
- Department of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Xiuwei Zhang
- Department of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Le Song
- Department of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Srinivas Aluru
- Department of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA
| |
Collapse
|
150
|
Erbe R, Gore J, Gemmill K, Gaykalova DA, Fertig EJ. The use of machine learning to discover regulatory networks controlling biological systems. Mol Cell 2022; 82:260-273. [PMID: 35016036 PMCID: PMC8905511 DOI: 10.1016/j.molcel.2021.12.011] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 12/06/2021] [Accepted: 12/13/2021] [Indexed: 01/22/2023]
Abstract
Biological systems are composed of a vast web of multiscale molecular interactors and interactions. High-throughput technologies, both bulk and single cell, now allow for investigation of the properties and quantities of these interactors. Computational algorithms and machine learning methods then provide the tools to derive meaningful insights from the resulting data sets. One such approach is graphical network modeling, which provides a computational framework to explicitly model the molecular interactions within and between the cells comprising biological systems. These graphical networks aim to describe a putative chain of cause and effect between interacting molecules. This feature allows for determination of key molecules in a biological process, accelerated generation of mechanistic hypotheses, and simulation of experimental outcomes. We review the computational concepts and applications of graphical network models across molecular scales for both intracellular and intercellular regulatory biology, examples of successful applications, and the future directions needed to overcome current limitations.
Collapse
Affiliation(s)
- Rossin Erbe
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA; Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA; Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
| | - Jessica Gore
- Institute for Genome Sciences, University of Maryland Medical Center, Baltimore, MD, USA
| | - Kelly Gemmill
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA; Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
| | - Daria A Gaykalova
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA; Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA; Institute for Genome Sciences, University of Maryland Medical Center, Baltimore, MD, USA; Department of Otorhinolaryngology-Head and Neck Surgery, University of Maryland Medical Center, Baltimore, MD, USA; Marlene & Stewart Greenebaum Comprehensive Cancer Center, University of Maryland Medical Center, Baltimore, MD, USA
| | - Elana J Fertig
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA; Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA; Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA; Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|