1
|
Xie X, Wang F, Wang G, Zhu W, Du X, Wang H. Learning the cellular activity representation based on gene regulatory networks for prediction of tumor response to drugs. Artif Intell Med 2024; 152:102864. [PMID: 38640702 DOI: 10.1016/j.artmed.2024.102864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 01/28/2024] [Accepted: 03/30/2024] [Indexed: 04/21/2024]
Abstract
Predicting the response of tumor cells to anti-tumor drugs is critical to realizing cancer precision medicine. Currently, most existing methods ignore the regulatory relationships between genes and thus have unsatisfactory predictive performance. In this paper, we propose to predict anti-tumor drug efficacy via learning the activity representation of tumor cells based on a priori knowledge of gene regulation networks (GRNs). Specifically, the method simulates the cellular biosystem by synthesizing a cell-gene activity network and then infers a new low-dimensional activity representation for tumor cells from the raw high-dimensional expression profile. The simulated cell-gene network mainly comprises known gene regulatory networks collected from multiple resources and fuses tumor cells by linking them to hotspot genes that are over- or under-expressed in them. The resulting activity representation could not only reflect the shallow expression profile (hotspot genes) but also mines in-depth information of gene regulation activity in tumor cells before treatment. Finally, we build deep learning models on the activity representation for predicting drug efficacy in tumor cells. Experimental results on the benchmark GDSC dataset demonstrate the superior performance of the proposed method over SOTA methods with the highest AUC of 0.954 in the efficacy label prediction and the best R2 of 0.834 in the regression of half maximal inhibitory concentration (IC50) values, suggesting the potential value of the proposed method in practice.
Collapse
Affiliation(s)
- Xinping Xie
- School of mathematics and physics, Anhui Jianzhu University, Hefei, China
| | - Fengting Wang
- School of mathematics and physics, Anhui Jianzhu University, Hefei, China; Institute of Intelligent Machines, Hefei Institutes of Physical Science, CAS, Hefei, China
| | - Guanfu Wang
- School of mathematics and physics, Anhui Jianzhu University, Hefei, China
| | - Weiwei Zhu
- Institute of Intelligent Machines, Hefei Institutes of Physical Science, CAS, Hefei, China; Zhongqi AI Lab, Hefei, China
| | - Xiaodong Du
- Experimental Teaching Center, Hefei University, Hefei, China
| | - Hongqiang Wang
- Institute of Intelligent Machines, Hefei Institutes of Physical Science, CAS, Hefei, China; Zhongqi AI Lab, Hefei, China.
| |
Collapse
|
2
|
Chen C, Padi M. Flexible modeling of regulatory networks improves transcription factor activity estimation. NPJ Syst Biol Appl 2024; 10:58. [PMID: 38806476 DOI: 10.1038/s41540-024-00386-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 05/13/2024] [Indexed: 05/30/2024] Open
Abstract
Transcriptional regulation plays a crucial role in determining cell fate and disease, yet inferring the key regulators from gene expression data remains a significant challenge. Existing methods for estimating transcription factor (TF) activity often rely on static TF-gene interaction databases and cannot adapt to changes in regulatory mechanisms across different cell types and disease conditions. Here, we present a new algorithm - Transcriptional Inference using Gene Expression and Regulatory data (TIGER) - that overcomes these limitations by flexibly modeling activation and inhibition events, up-weighting essential edges, shrinking irrelevant edges towards zero through a sparse Bayesian prior, and simultaneously estimating both TF activity levels and changes in the underlying regulatory network. When applied to yeast and cancer TF knock-out datasets, TIGER outperforms comparable methods in terms of prediction accuracy. Moreover, our application of TIGER to tissue- and cell-type-specific RNA-seq data demonstrates its ability to uncover differences in regulatory mechanisms. Collectively, our findings highlight the utility of modeling context-specific regulation when inferring transcription factor activities.
Collapse
Affiliation(s)
- Chen Chen
- Department of Epidemiology and Biostatistics, University of Arizona Mel and Enid Zuckerman College of Public Health, Tucson, AZ, USA
- University of Arizona Cancer Center, University of Arizona, Tucson, AZ, USA
| | - Megha Padi
- University of Arizona Cancer Center, University of Arizona, Tucson, AZ, USA.
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ, USA.
| |
Collapse
|
3
|
Tjärnberg A, Beheler-Amass M, Jackson CA, Christiaen LA, Gresham D, Bonneau R. Structure-primed embedding on the transcription factor manifold enables transparent model architectures for gene regulatory network and latent activity inference. Genome Biol 2024; 25:24. [PMID: 38238840 PMCID: PMC10797903 DOI: 10.1186/s13059-023-03134-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 11/30/2023] [Indexed: 01/22/2024] Open
Abstract
BACKGROUND Modeling of gene regulatory networks (GRNs) is limited due to a lack of direct measurements of genome-wide transcription factor activity (TFA) making it difficult to separate covariance and regulatory interactions. Inference of regulatory interactions and TFA requires aggregation of complementary evidence. Estimating TFA explicitly is problematic as it disconnects GRN inference and TFA estimation and is unable to account for, for example, contextual transcription factor-transcription factor interactions, and other higher order features. Deep-learning offers a potential solution, as it can model complex interactions and higher-order latent features, although does not provide interpretable models and latent features. RESULTS We propose a novel autoencoder-based framework, StrUcture Primed Inference of Regulation using latent Factor ACTivity (SupirFactor) for modeling, and a metric, explained relative variance (ERV), for interpretation of GRNs. We evaluate SupirFactor with ERV in a wide set of contexts. Compared to current state-of-the-art GRN inference methods, SupirFactor performs favorably. We evaluate latent feature activity as an estimate of TFA and biological function in S. cerevisiae as well as in peripheral blood mononuclear cells (PBMC). CONCLUSION Here we present a framework for structure-primed inference and interpretation of GRNs, SupirFactor, demonstrating interpretability using ERV in multiple biological and experimental settings. SupirFactor enables TFA estimation and pathway analysis using latent factor activity, demonstrated here on two large-scale single-cell datasets, modeling S. cerevisiae and PBMC. We find that the SupirFactor model facilitates biological analysis acquiring novel functional and regulatory insight.
Collapse
Affiliation(s)
- Andreas Tjärnberg
- Center for Developmental Genetics, New York University, New York, NY, 10003, USA.
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA.
- Department of Biology, NYU, New York, NY, 10008, USA.
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, 10010, USA.
- Department of Neuro-Science, University of Wisconsin-Madison - Waisman Center, Madison, USA.
| | - Maggie Beheler-Amass
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Christopher A Jackson
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Lionel A Christiaen
- Center for Developmental Genetics, New York University, New York, NY, 10003, USA
- Department of Biology, NYU, New York, NY, 10008, USA
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
- Department of Heart Disease, Haukeland University Hospital, Bergen, Norway
| | - David Gresham
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Richard Bonneau
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA.
- Department of Biology, NYU, New York, NY, 10008, USA.
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY, 10010, USA.
- Courant Institute of Mathematical Sciences, Computer Science Department, New York University, New York, NY, 10003, USA.
- Center For Data Science, NYU, New York, NY, 10008, USA.
- Prescient Design, a Genentech accelerator, New York, NY, 10010, USA.
| |
Collapse
|
4
|
Wu Y, Qian B, Wang A, Dong H, Zhu E, Ma B. iLSGRN: inference of large-scale gene regulatory networks based on multi-model fusion. Bioinformatics 2023; 39:btad619. [PMID: 37851379 PMCID: PMC10589915 DOI: 10.1093/bioinformatics/btad619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 10/04/2023] [Accepted: 10/17/2023] [Indexed: 10/19/2023] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) are a way of describing the interaction between genes, which contribute to revealing the different biological mechanisms in the cell. Reconstructing GRNs based on gene expression data has been a central computational problem in systems biology. However, due to the high dimensionality and non-linearity of large-scale GRNs, accurately and efficiently inferring GRNs is still a challenging task. RESULTS In this article, we propose a new approach, iLSGRN, to reconstruct large-scale GRNs from steady-state and time-series gene expression data based on non-linear ordinary differential equations. Firstly, the regulatory gene recognition algorithm calculates the Maximal Information Coefficient between genes and excludes redundant regulatory relationships to achieve dimensionality reduction. Then, the feature fusion algorithm constructs a model leveraging the feature importance derived from XGBoost (eXtreme Gradient Boosting) and RF (Random Forest) models, which can effectively train the non-linear ordinary differential equations model of GRNs and improve the accuracy and stability of the inference algorithm. The extensive experiments on different scale datasets show that our method makes sensible improvement compared with the state-of-the-art methods. Furthermore, we perform cross-validation experiments on the real gene datasets to validate the robustness and effectiveness of the proposed method. AVAILABILITY AND IMPLEMENTATION The proposed method is written in the Python language, and is available at: https://github.com/lab319/iLSGRN.
Collapse
Affiliation(s)
- Yiming Wu
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Bing Qian
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Anqi Wang
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong 999077, China
| | - Heng Dong
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Enqiang Zhu
- Institution of Computing Science and Technology, Guangzhou University, Guangzhou 510006, China
| | - Baoshan Ma
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
5
|
Pan TC, Chockalingam SP, Aluru M, Aluru S. MCPNet: a parallel maximum capacity-based genome-scale gene network construction framework. Bioinformatics 2023; 39:btad373. [PMID: 37289522 PMCID: PMC10287961 DOI: 10.1093/bioinformatics/btad373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 04/06/2023] [Accepted: 06/06/2023] [Indexed: 06/10/2023] Open
Abstract
MOTIVATION Gene network reconstruction from gene expression profiles is a compute- and data-intensive problem. Numerous methods based on diverse approaches including mutual information, random forests, Bayesian networks, correlation measures, as well as their transforms and filters such as data processing inequality, have been proposed. However, an effective gene network reconstruction method that performs well in all three aspects of computational efficiency, data size scalability, and output quality remains elusive. Simple techniques such as Pearson correlation are fast to compute but ignore indirect interactions, while more robust methods such as Bayesian networks are prohibitively time consuming to apply to tens of thousands of genes. RESULTS We developed maximum capacity path (MCP) score, a novel maximum-capacity-path-based metric to quantify the relative strengths of direct and indirect gene-gene interactions. We further present MCPNet, an efficient, parallelized gene network reconstruction software based on MCP score, to reverse engineer networks in unsupervised and ensemble manners. Using synthetic and real Saccharomyces cervisiae datasets as well as real Arabidopsis thaliana datasets, we demonstrate that MCPNet produces better quality networks as measured by AUPRC, is significantly faster than all other gene network reconstruction software, and also scales well to tens of thousands of genes and hundreds of CPU cores. Thus, MCPNet represents a new gene network reconstruction tool that simultaneously achieves quality, performance, and scalability requirements. AVAILABILITY AND IMPLEMENTATION Source code freely available for download at https://doi.org/10.5281/zenodo.6499747 and https://github.com/AluruLab/MCPNet, implemented in C++ and supported on Linux.
Collapse
Affiliation(s)
- Tony C Pan
- Department of Biomedical Informatics, Emory University, Woodruff Memorial Research Building 101 Woodruff Circle, 4th Floor East, Atlanta, GA 30322, United States
- Institute for Data Engineering and Science, Georgia Institute of Technology, 756 W Peachtree St NW, 12th Floor, Atlanta, GA 30332, United States
| | - Sriram P Chockalingam
- Institute for Data Engineering and Science, Georgia Institute of Technology, 756 W Peachtree St NW, 12th Floor, Atlanta, GA 30332, United States
| | - Maneesha Aluru
- School of Biological Sciences, Georgia Institute of Technology, 310 Ferst Dr NW, Atlanta, GA 30332, United States
| | - Srinivas Aluru
- Institute for Data Engineering and Science, Georgia Institute of Technology, 756 W Peachtree St NW, 12th Floor, Atlanta, GA 30332, United States
- School of Computational Science and Engineering, Georgia Institute of Technology, 756 W Peachtree St NW, 13th Floor, Atlanta, GA 30332, United States
| |
Collapse
|
6
|
Zhang S, Pyne S, Pietrzak S, Halberg S, McCalla SG, Siahpirani AF, Sridharan R, Roy S. Inference of cell type-specific gene regulatory networks on cell lineages from single cell omic datasets. Nat Commun 2023; 14:3064. [PMID: 37244909 PMCID: PMC10224950 DOI: 10.1038/s41467-023-38637-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 05/10/2023] [Indexed: 05/29/2023] Open
Abstract
Cell type-specific gene expression patterns are outputs of transcriptional gene regulatory networks (GRNs) that connect transcription factors and signaling proteins to target genes. Single-cell technologies such as single cell RNA-sequencing (scRNA-seq) and single cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq), can examine cell-type specific gene regulation at unprecedented detail. However, current approaches to infer cell type-specific GRNs are limited in their ability to integrate scRNA-seq and scATAC-seq measurements and to model network dynamics on a cell lineage. To address this challenge, we have developed single-cell Multi-Task Network Inference (scMTNI), a multi-task learning framework to infer the GRN for each cell type on a lineage from scRNA-seq and scATAC-seq data. Using simulated and real datasets, we show that scMTNI is a broadly applicable framework for linear and branching lineages that accurately infers GRN dynamics and identifies key regulators of fate transitions for diverse processes such as cellular reprogramming and differentiation.
Collapse
Affiliation(s)
- Shilu Zhang
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Saptarshi Pyne
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Stefan Pietrzak
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, WI, USA
| | - Spencer Halberg
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Sunnie Grace McCalla
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Alireza Fotuhi Siahpirani
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Rupa Sridharan
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, WI, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA.
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
7
|
McCalla SG, Fotuhi Siahpirani A, Li J, Pyne S, Stone M, Periyasamy V, Shin J, Roy S. Identifying strengths and weaknesses of methods for computational network inference from single-cell RNA-seq data. G3 (BETHESDA, MD.) 2023; 13:jkad004. [PMID: 36626328 PMCID: PMC9997554 DOI: 10.1093/g3journal/jkad004] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 11/09/2022] [Accepted: 12/16/2022] [Indexed: 01/11/2023]
Abstract
Single-cell RNA-sequencing (scRNA-seq) offers unparalleled insight into the transcriptional programs of different cellular states by measuring the transcriptome of thousands of individual cells. An emerging problem in the analysis of scRNA-seq is the inference of transcriptional gene regulatory networks and a number of methods with different learning frameworks have been developed to address this problem. Here, we present an expanded benchmarking study of eleven recent network inference methods on seven published scRNA-seq datasets in human, mouse, and yeast considering different types of gold standard networks and evaluation metrics. We evaluate methods based on their computing requirements as well as on their ability to recover the network structure. We find that, while most methods have a modest recovery of experimentally derived interactions based on global metrics such as Area Under the Precision Recall curve, methods are able to capture targets of regulators that are relevant to the system under study. Among the top performing methods that use only expression were SCENIC, PIDC, MERLIN or Correlation. Addition of prior biological knowledge and the estimation of transcription factor activities resulted in the best overall performance with the Inferelator and MERLIN methods that use prior knowledge outperforming methods that use expression alone. We found that imputation for network inference did not improve network inference accuracy and could be detrimental. Comparisons of inferred networks for comparable bulk conditions showed that the networks inferred from scRNA-seq datasets are often better or at par with the networks inferred from bulk datasets. Our analysis should be beneficial in selecting methods for network inference. At the same time, this highlights the need for improved methods and better gold standards for regulatory network inference from scRNAseq datasets.
Collapse
Affiliation(s)
- Sunnie Grace McCalla
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI 53706, USA
| | | | - Jiaxin Li
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Saptarshi Pyne
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
| | - Matthew Stone
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53792, USA
| | - Viswesh Periyasamy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Junha Shin
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53792, USA
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA
| |
Collapse
|
8
|
Tjärnberg A, Beheler-Amass M, Jackson CA, Christiaen LA, Gresham D, Bonneau R. Structure primed embedding on the transcription factor manifold enables transparent model architectures for gene regulatory network and latent activity inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.02.526909. [PMID: 36778259 PMCID: PMC9915715 DOI: 10.1101/2023.02.02.526909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The modeling of gene regulatory networks (GRNs) is limited due to a lack of direct measurements of regulatory features in genome-wide screens. Most GRN inference methods are therefore forced to model relationships between regulatory genes and their targets with expression as a proxy for the upstream independent features, complicating validation and predictions produced by modeling frameworks. Separating covariance and regulatory influence requires aggregation of independent and complementary sets of evidence, such as transcription factor (TF) binding and target gene expression. However, the complete regulatory state of the system, e.g. TF activity (TFA) is unknown due to a lack of experimental feasibility, making regulatory relations difficult to infer. Some methods attempt to account for this by modeling TFA as a latent feature, but these models often use linear frameworks that are unable to account for non-linearities such as saturation, TF-TF interactions, and other higher order features. Deep learning frameworks may offer a solution, as they are capable of modeling complex interactions and capturing higher-order latent features. However, these methods often discard central concepts in biological systems modeling, such as sparsity and latent feature interpretability, in favor of increased model complexity. We propose a novel deep learning autoencoder-based framework, StrUcture Primed Inference of Regulation using latent Factor ACTivity (SupirFactor), that scales to single cell genomic data and maintains interpretability to perform GRN inference and estimate TFA as a latent feature. We demonstrate that SupirFactor outperforms current leading GRN inference methods, predicts biologically relevant TFA and elucidates functional regulatory pathways through aggregation of TFs.
Collapse
Affiliation(s)
- Andreas Tjärnberg
- Center for Developmental Genetics, New York University, New York 10003 NY, USA
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, 10010, USA
| | - Maggie Beheler-Amass
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
| | - Christopher A Jackson
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
| | - Lionel A Christiaen
- Center for Developmental Genetics, New York University, New York 10003 NY, USA
- Department of Biology, NYU, New York, NY 10008, USA
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
- Department of Heart Disease, Haukeland University Hospital, Bergen, Norway
| | - David Gresham
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
| | - Richard Bonneau
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
- Courant Institute of Mathematical Sciences, Computer Science Department, New York University, New York, NY 10003, USA
- Center For Data Science, NYU, New York, NY 10008, USA
- Prescient Design, a Genentech accelerator, New York, NY, 10010, USA
| |
Collapse
|
9
|
Liu Y, Yan H, Shen LC, Yu DJ. Learning Cell Annotation under Multiple Reference Datasets by Multisource Domain Adaptation. J Chem Inf Model 2023; 63:397-405. [PMID: 36579851 DOI: 10.1021/acs.jcim.2c01277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Accurate and efficient cell type annotation is essential for single-cell sequence analysis. Currently, cell type annotation using well-annotated reference datasets with powerful models has become increasingly popular. However, with the increasing amount of single-cell data, there is an urgent need to develop a novel annotation method that can integrate multiple reference datasets to improve cell type annotation performance. Since the unwanted batch effects between individual reference datasets, integrating multiple reference datasets is still an open challenge. To address this, we proposed scMDR and scMultiR, respectively, using multisource domain adaptation to learn cell type-specific information from multiple reference datasets and query cells. Based on the learned cell type-specific information, scMDR and scMultiR provide the most likely cell types for the query cells. Benchmark experiments demonstrated their state-of-the-art effectiveness for integrative single-cell assignment with multiple reference datasets.
Collapse
Affiliation(s)
- Yan Liu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, Jiangsu210094, China
| | - He Yan
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, Jiangsu210094, China
| | - Long-Chen Shen
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, Jiangsu210094, China
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, Jiangsu210094, China
| |
Collapse
|
10
|
Escorcia-Rodríguez JM, Gaytan-Nuñez E, Hernandez-Benitez EM, Zorro-Aranda A, Tello-Palencia MA, Freyre-González JA. Improving gene regulatory network inference and assessment: The importance of using network structure. Front Genet 2023; 14:1143382. [PMID: 36926589 PMCID: PMC10012345 DOI: 10.3389/fgene.2023.1143382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 02/20/2023] [Indexed: 03/03/2023] Open
Abstract
Gene regulatory networks are graph models representing cellular transcription events. Networks are far from complete due to time and resource consumption for experimental validation and curation of the interactions. Previous assessments have shown the modest performance of the available network inference methods based on gene expression data. Here, we study several caveats on the inference of regulatory networks and methods assessment through the quality of the input data and gold standard, and the assessment approach with a focus on the global structure of the network. We used synthetic and biological data for the predictions and experimentally-validated biological networks as the gold standard (ground truth). Standard performance metrics and graph structural properties suggest that methods inferring co-expression networks should no longer be assessed equally with those inferring regulatory interactions. While methods inferring regulatory interactions perform better in global regulatory network inference than co-expression-based methods, the latter is better suited to infer function-specific regulons and co-regulation networks. When merging expression data, the size increase should outweigh the noise inclusion and graph structure should be considered when integrating the inferences. We conclude with guidelines to take advantage of inference methods and their assessment based on the applications and available expression datasets.
Collapse
Affiliation(s)
- Juan M Escorcia-Rodríguez
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Estefani Gaytan-Nuñez
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico.,Undergraduate Program in Genomic Sciences, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Ericka M Hernandez-Benitez
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico.,Undergraduate Program in Genomic Sciences, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Andrea Zorro-Aranda
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico.,Department of Chemical Engineering, Universidad de Antioquia, Medellín, Colombia
| | - Marco A Tello-Palencia
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico.,Undergraduate Program in Genomic Sciences, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Julio A Freyre-González
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| |
Collapse
|
11
|
Özel MN, Gibbs CS, Holguera I, Soliman M, Bonneau R, Desplan C. Coordinated control of neuronal differentiation and wiring by sustained transcription factors. Science 2022; 378:eadd1884. [PMID: 36480601 DOI: 10.1126/science.add1884] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The large diversity of cell types in nervous systems presents a challenge in identifying the genetic mechanisms that encode it. Here, we report that nearly 200 distinct neurons in the Drosophila visual system can each be defined by unique combinations of on average 10 continuously expressed transcription factors. We show that targeted modifications of this terminal selector code induce predictable conversions of neuronal fates that appear morphologically and transcriptionally complete. Cis-regulatory analysis of open chromatin links one of these genes to an upstream patterning factor that specifies neuronal fates in stem cells. Experimentally validated network models describe the synergistic regulation of downstream effectors by terminal selectors and ecdysone signaling during brain wiring. Our results provide a generalizable framework of how specific fates are implemented in postmitotic neurons.
Collapse
Affiliation(s)
| | - Claudia Skok Gibbs
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA.,Center for Data Science, New York University, New York, NY 10003, USA
| | - Isabel Holguera
- Department of Biology, New York University, New York, NY 10003, USA
| | - Mennah Soliman
- Department of Biology, New York University, New York, NY 10003, USA
| | - Richard Bonneau
- Department of Biology, New York University, New York, NY 10003, USA.,Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA.,Center for Data Science, New York University, New York, NY 10003, USA
| | - Claude Desplan
- Department of Biology, New York University, New York, NY 10003, USA.,New York University Abu Dhabi, Saadiyat Island, Abu Dhabi, United Arab Emirates
| |
Collapse
|
12
|
Galindez G, Sadegh S, Baumbach J, Kacprowski T, List M. Network-based approaches for modeling disease regulation and progression. Comput Struct Biotechnol J 2022; 21:780-795. [PMID: 36698974 PMCID: PMC9841310 DOI: 10.1016/j.csbj.2022.12.022] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 12/14/2022] [Accepted: 12/14/2022] [Indexed: 12/23/2022] Open
Abstract
Molecular interaction networks lay the foundation for studying how biological functions are controlled by the complex interplay of genes and proteins. Investigating perturbed processes using biological networks has been instrumental in uncovering mechanisms that underlie complex disease phenotypes. Rapid advances in omics technologies have prompted the generation of high-throughput datasets, enabling large-scale, network-based analyses. Consequently, various modeling techniques, including network enrichment, differential network extraction, and network inference, have proven to be useful for gaining new mechanistic insights. We provide an overview of recent network-based methods and their core ideas to facilitate the discovery of disease modules or candidate mechanisms. Knowledge generated from these computational efforts will benefit biomedical research, especially drug development and precision medicine. We further discuss current challenges and provide perspectives in the field, highlighting the need for more integrative and dynamic network approaches to model disease development and progression.
Collapse
Affiliation(s)
- Gihanna Galindez
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Sepideh Sadegh
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany.,Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany.,Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| |
Collapse
|
13
|
Vittadello ST, Stumpf MPH. Open problems in mathematical biology. Math Biosci 2022; 354:108926. [PMID: 36377100 DOI: 10.1016/j.mbs.2022.108926] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 10/21/2022] [Accepted: 10/21/2022] [Indexed: 11/06/2022]
Abstract
Biology is data-rich, and it is equally rich in concepts and hypotheses. Part of trying to understand biological processes and systems is therefore to confront our ideas and hypotheses with data using statistical methods to determine the extent to which our hypotheses agree with reality. But doing so in a systematic way is becoming increasingly challenging as our hypotheses become more detailed, and our data becomes more complex. Mathematical methods are therefore gaining in importance across the life- and biomedical sciences. Mathematical models allow us to test our understanding, make testable predictions about future behaviour, and gain insights into how we can control the behaviour of biological systems. It has been argued that mathematical methods can be of great benefit to biologists to make sense of data. But mathematics and mathematicians are set to benefit equally from considering the often bewildering complexity inherent to living systems. Here we present a small selection of open problems and challenges in mathematical biology. We have chosen these open problems because they are of both biological and mathematical interest.
Collapse
Affiliation(s)
- Sean T Vittadello
- Melbourne Integrative Genomics, University of Melbourne, Australia; School of BioSciences, University of Melbourne, Australia
| | - Michael P H Stumpf
- Melbourne Integrative Genomics, University of Melbourne, Australia; School of BioSciences, University of Melbourne, Australia; School of Mathematics and Statistics, University of Melbourne, Australia.
| |
Collapse
|
14
|
Gan Y, Hu X, Zou G, Yan C, Xu G. Inferring Gene Regulatory Networks From Single-Cell Transcriptomic Data Using Bidirectional RNN. Front Oncol 2022; 12:899825. [PMID: 35692809 PMCID: PMC9178250 DOI: 10.3389/fonc.2022.899825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Accepted: 04/22/2022] [Indexed: 11/30/2022] Open
Abstract
Accurate inference of gene regulatory rules is critical to understanding cellular processes. Existing computational methods usually decompose the inference of gene regulatory networks (GRNs) into multiple subproblems, rather than detecting potential causal relationships simultaneously, which limits the application to data with a small number of genes. Here, we propose BiRGRN, a novel computational algorithm for inferring GRNs from time-series single-cell RNA-seq (scRNA-seq) data. BiRGRN utilizes a bidirectional recurrent neural network to infer GRNs. The recurrent neural network is a complex deep neural network that can capture complex, non-linear, and dynamic relationships among variables. It maps neurons to genes, and maps the connections between neural network layers to the regulatory relationship between genes, providing an intuitive solution to model GRNs with biological closeness and mathematical flexibility. Based on the deep network, we transform the inference of GRNs into a regression problem, using the gene expression data at previous time points to predict the gene expression data at the later time point. Furthermore, we adopt two strategies to improve the accuracy and stability of the algorithm. Specifically, we utilize a bidirectional structure to integrate the forward and reverse inference results and exploit an incomplete set of prior knowledge to filter out some candidate inferences of low confidence. BiRGRN is applied to four simulated datasets and three real scRNA-seq datasets to verify the proposed method. We perform comprehensive comparisons between our proposed method with other state-of-the-art techniques. These experimental results indicate that BiRGRN is capable of inferring GRN simultaneously from time-series scRNA-seq data. Our method BiRGRN is implemented in Python using the TensorFlow machine-learning library, and it is freely available at https://gitee.com/DHUDBLab/bi-rgrn.
Collapse
Affiliation(s)
- Yanglan Gan
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Xin Hu
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Guobing Zou
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| | - Cairong Yan
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Guangwei Xu
- School of Computer Science and Technology, Donghua University, Shanghai, China
- *Correspondence: Guangwei Xu,
| |
Collapse
|
15
|
Freyre-González JA, Escorcia-Rodríguez JM, Gutiérrez-Mondragón LF, Martí-Vértiz J, Torres-Franco CN, Zorro-Aranda A. System Principles Governing the Organization, Architecture, Dynamics, and Evolution of Gene Regulatory Networks. Front Bioeng Biotechnol 2022; 10:888732. [PMID: 35646858 PMCID: PMC9135355 DOI: 10.3389/fbioe.2022.888732] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 04/27/2022] [Indexed: 11/21/2022] Open
Abstract
Synthetic biology aims to apply engineering principles for the rational, systematical design and construction of biological systems displaying functions that do not exist in nature or even building a cell from scratch. Understanding how molecular entities interconnect, work, and evolve in an organism is pivotal to this aim. Here, we summarize and discuss some historical organizing principles identified in bacterial gene regulatory networks. We propose a new layer, the concilion, which is the group of structural genes and their local regulators responsible for a single function that, organized hierarchically, coordinate a response in a way reminiscent of the deliberation and negotiation that take place in a council. We then highlight the importance that the network structure has, and discuss that the natural decomposition approach has unveiled the system-level elements shaping a common functional architecture governing bacterial regulatory networks. We discuss the incompleteness of gene regulatory networks and the need for network inference and benchmarking standardization. We point out the importance that using the network structural properties showed to improve network inference. We discuss the advances and controversies regarding the consistency between reconstructions of regulatory networks and expression data. We then discuss some perspectives on the necessity of studying regulatory networks, considering the interactions’ strength distribution, the challenges to studying these interactions’ strength, and the corresponding effects on network structure and dynamics. Finally, we explore the ability of evolutionary systems biology studies to provide insights into how evolution shapes functional architecture despite the high evolutionary plasticity of regulatory networks.
Collapse
Affiliation(s)
- Julio A Freyre-González
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Juan M Escorcia-Rodríguez
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Luis F Gutiérrez-Mondragón
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, México
- Undergraduate Program in Genomic Sciences, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Jerónimo Martí-Vértiz
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Camila N Torres-Franco
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Andrea Zorro-Aranda
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, México
- Department of Chemical Engineering, Universidad de Antioquia, Medellín, Colombia
| |
Collapse
|
16
|
Jiang X, Zhang X. RSNET: inferring gene regulatory networks by a redundancy silencing and network enhancement technique. BMC Bioinformatics 2022; 23:165. [PMID: 35524190 PMCID: PMC9074326 DOI: 10.1186/s12859-022-04696-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 04/25/2022] [Indexed: 11/29/2022] Open
Abstract
Background Current gene regulatory network (GRN) inference methods are notorious for a great number of indirect interactions hidden in the predictions. Filtering out the indirect interactions from direct ones remains an important challenge in the reconstruction of GRNs. To address this issue, we developed a redundancy silencing and network enhancement technique (RSNET) for inferring GRNs. Results To assess the performance of RSNET method, we implemented the experiments on several gold-standard networks by using simulation study, DREAM challenge dataset and Escherichia coli network. The results show that RSNET method performed better than the compared methods in sensitivity and accuracy. As a case of study, we used RSNET to construct functional GRN for apple fruit ripening from gene expression data. Conclusions In the proposed method, the redundant interactions including weak and indirect connections are silenced by recursive optimization adaptively, and the highly dependent nodes are constrained in the model to keep the real interactions. This study provides a useful tool for inferring clean networks. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04696-w.
Collapse
Affiliation(s)
- Xiaohan Jiang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China.,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, 430074, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China. .,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, 430074, China.
| |
Collapse
|
17
|
Deep neural network prediction of genome-wide transcriptome signatures - beyond the Black-box. NPJ Syst Biol Appl 2022; 8:9. [PMID: 35197482 PMCID: PMC8866467 DOI: 10.1038/s41540-022-00218-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 01/24/2022] [Indexed: 11/28/2022] Open
Abstract
Prediction algorithms for protein or gene structures, including transcription factor binding from sequence information, have been transformative in understanding gene regulation. Here we ask whether human transcriptomic profiles can be predicted solely from the expression of transcription factors (TFs). We find that the expression of 1600 TFs can explain >95% of the variance in 25,000 genes. Using the light-up technique to inspect the trained NN, we find an over-representation of known TF-gene regulations. Furthermore, the learned prediction network has a hierarchical organization. A smaller set of around 125 core TFs could explain close to 80% of the variance. Interestingly, reducing the number of TFs below 500 induces a rapid decline in prediction performance. Next, we evaluated the prediction model using transcriptional data from 22 human diseases. The TFs were sufficient to predict the dysregulation of the target genes (rho = 0.61, P < 10−216). By inspecting the model, key causative TFs could be extracted for subsequent validation using disease-associated genetic variants. We demonstrate a methodology for constructing an interpretable neural network predictor, where analyses of the predictors identified key TFs that were inducing transcriptional changes during disease.
Collapse
|
18
|
Gibbs CS, Jackson CA, Saldi GA, Tjärnberg A, Shah A, Watters A, De Veaux N, Tchourine K, Yi R, Hamamsy T, Castro DM, Carriero N, Gorissen BL, Gresham D, Miraldi ER, Bonneau R. High performance single-cell gene regulatory network inference at scale: The Inferelator 3.0. Bioinformatics 2022; 38:2519-2528. [PMID: 35188184 PMCID: PMC9048651 DOI: 10.1093/bioinformatics/btac117] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 12/08/2021] [Accepted: 02/17/2022] [Indexed: 12/04/2022] Open
Abstract
Motivation Gene regulatory networks define regulatory relationships between transcription factors and target genes within a biological system, and reconstructing them is essential for understanding cellular growth and function. Methods for inferring and reconstructing networks from genomics data have evolved rapidly over the last decade in response to advances in sequencing technology and machine learning. The scale of data collection has increased dramatically; the largest genome-wide gene expression datasets have grown from thousands of measurements to millions of single cells, and new technologies are on the horizon to increase to tens of millions of cells and above. Results In this work, we present the Inferelator 3.0, which has been significantly updated to integrate data from distinct cell types to learn context-specific regulatory networks and aggregate them into a shared regulatory network, while retaining the functionality of the previous versions. The Inferelator is able to integrate the largest single-cell datasets and learn cell-type-specific gene regulatory networks. Compared to other network inference methods, the Inferelator learns new and informative Saccharomyces cerevisiae networks from single-cell gene expression data, measured by recovery of a known gold standard. We demonstrate its scaling capabilities by learning networks for multiple distinct neuronal and glial cell types in the developing Mus musculus brain at E18 from a large (1.3 million) single-cell gene expression dataset with paired single-cell chromatin accessibility data. Availability and implementation The inferelator software is available on GitHub (https://github.com/flatironinstitute/inferelator) under the MIT license and has been released as python packages with associated documentation (https://inferelator.readthedocs.io/). Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Claudia Skok Gibbs
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY, USA.,Center For Data Science, NYU, New York, NY, USA
| | - Christopher A Jackson
- Center For Genomics and Systems Biology, NYU, New York, NY, USA.,Department of Biology, NYU, New York, NY, USA
| | - Giuseppe-Antonio Saldi
- Center For Genomics and Systems Biology, NYU, New York, NY, USA.,Department of Biology, NYU, New York, NY, USA
| | - Andreas Tjärnberg
- Center For Genomics and Systems Biology, NYU, New York, NY, USA.,Department of Biology, NYU, New York, NY, USA
| | - Aashna Shah
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY, USA
| | - Aaron Watters
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY, USA
| | - Nicholas De Veaux
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY, USA
| | | | - Ren Yi
- Courant Institute of Mathematical Sciences, Computer Science Department, NYU, New York, NY, USA
| | | | - Dayanne M Castro
- Center For Genomics and Systems Biology, NYU, New York, NY, USA.,Department of Biology, NYU, New York, NY, USA
| | - Nicholas Carriero
- Flatiron Institute, Scientific Computing Core, Simons Foundation, New York, NY, USA
| | - Bram L Gorissen
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - David Gresham
- Center For Genomics and Systems Biology, NYU, New York, NY, USA.,Department of Biology, NYU, New York, NY, USA
| | - Emily R Miraldi
- Divisions of Immunobiology and Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.,Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Richard Bonneau
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY, USA.,Center For Data Science, NYU, New York, NY, USA.,Center For Genomics and Systems Biology, NYU, New York, NY, USA.,Department of Biology, NYU, New York, NY, USA.,Courant Institute of Mathematical Sciences, Computer Science Department, NYU, New York, NY, USA
| |
Collapse
|
19
|
Deshpande A, Chu LF, Stewart R, Gitter A. Network inference with Granger causality ensembles on single-cell transcriptomics. Cell Rep 2022; 38:110333. [PMID: 35139376 DOI: 10.1016/j.celrep.2022.110333] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 02/19/2021] [Accepted: 01/12/2022] [Indexed: 12/20/2022] Open
Abstract
Cellular gene expression changes throughout a dynamic biological process, such as differentiation. Pseudotimes estimate cells' progress along a dynamic process based on their individual gene expression states. Ordering the expression data by pseudotime provides information about the underlying regulator-gene interactions. Because the pseudotime distribution is not uniform, many standard mathematical methods are inapplicable for analyzing the ordered gene expression states. Here we present single-cell inference of networks using Granger ensembles (SINGE), an algorithm for gene regulatory network inference from ordered single-cell gene expression data. SINGE uses kernel-based Granger causality regression to smooth irregular pseudotimes and missing expression values. It aggregates predictions from an ensemble of regression analyses to compile a ranked list of candidate interactions between transcriptional regulators and target genes. In two mouse embryonic stem cell differentiation datasets, SINGE outperforms other contemporary algorithms. However, a more detailed examination reveals caveats about poor performance for individual regulators and uninformative pseudotimes.
Collapse
Affiliation(s)
- Atul Deshpande
- Department of Electrical and Computer Engineering, University of Wisconsin - Madison, Madison, WI 53706, USA; Morgridge Institute for Research, Madison, WI 53715, USA
| | - Li-Fang Chu
- Morgridge Institute for Research, Madison, WI 53715, USA
| | - Ron Stewart
- Morgridge Institute for Research, Madison, WI 53715, USA
| | - Anthony Gitter
- Morgridge Institute for Research, Madison, WI 53715, USA; Department of Biostatistics and Medical Informatics, University of Wisconsin - Madison, Madison, WI 53792, USA.
| |
Collapse
|
20
|
Allaway KC, Gabitto MI, Wapinski O, Saldi G, Wang CY, Bandler RC, Wu SJ, Bonneau R, Fishell G. Genetic and epigenetic coordination of cortical interneuron development. Nature 2021; 597:693-697. [PMID: 34552240 DOI: 10.1038/s41586-021-03933-1] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 08/18/2021] [Indexed: 11/09/2022]
Abstract
One of the hallmarks of the cerebral cortex is the extreme diversity of interneurons1-3. The two largest subtypes of cortical interneurons, parvalbumin- and somatostatin-positive cells, are morphologically and functionally distinct in adulthood but arise from common lineages within the medial ganglionic eminence4-11. This makes them an attractive model for studying the generation of cell diversity. Here we examine how developmental changes in transcription and chromatin structure enable these cells to acquire distinct identities in the mouse cortex. Generic interneuron features are first detected upon cell cycle exit through the opening of chromatin at distal elements. By constructing cell-type-specific gene regulatory networks, we observed that parvalbumin- and somatostatin-positive cells initiate distinct programs upon settling within the cortex. We used these networks to model the differential transcriptional requirement of a shared regulator, Mef2c, and confirmed the accuracy of our predictions through experimental loss-of-function experiments. We therefore reveal how a common molecular program diverges to enable these neuronal subtypes to acquire highly specialized properties by adulthood. Our methods provide a framework for examining the emergence of cellular diversity, as well as for quantifying and predicting the effect of candidate genes on cell-type-specific development.
Collapse
Affiliation(s)
- Kathryn C Allaway
- Neuroscience Institute, New York University, New York, NY, USA.,Department of Neurobiology, Harvard Medical School, Boston, MA, USA.,Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA, USA
| | - Mariano I Gabitto
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Orly Wapinski
- Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA, USA
| | - Giuseppe Saldi
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA.,Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA, USA.,Department of Biology, New York University, New York, NY, USA
| | - Chen-Yu Wang
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA.,Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA, USA
| | - Rachel C Bandler
- Neuroscience Institute, New York University, New York, NY, USA.,Department of Neurobiology, Harvard Medical School, Boston, MA, USA.,Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA, USA
| | - Sherry Jingjing Wu
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA.,Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA, USA
| | - Richard Bonneau
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA. .,Department of Biology, New York University, New York, NY, USA. .,Center for Data Science, New York University, New York, NY, USA.
| | - Gord Fishell
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA. .,Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA, USA.
| |
Collapse
|
21
|
Wu AP, Peng J, Berger B, Cho H. Bayesian information sharing enhances detection of regulatory associations in rare cell types. Bioinformatics 2021; 37:i349-i357. [PMID: 34252956 PMCID: PMC8275330 DOI: 10.1093/bioinformatics/btab269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Recent advances in single-cell RNA-sequencing (scRNA-seq) technologies promise to enable the study of gene regulatory associations at unprecedented resolution in diverse cellular contexts. However, identifying unique regulatory associations observed only in specific cell types or conditions remains a key challenge; this is particularly so for rare transcriptional states whose sample sizes are too small for existing gene regulatory network inference methods to be effective. RESULTS We present ShareNet, a Bayesian framework for boosting the accuracy of cell type-specific gene regulatory networks by propagating information across related cell types via an information sharing structure that is adaptively optimized for a given single-cell dataset. The techniques we introduce can be used with a range of general network inference algorithms to enhance the output for each cell type. We demonstrate the enhanced accuracy of our approach on three benchmark scRNA-seq datasets. We find that our inferred cell type-specific networks also uncover key changes in gene associations that underpin the complex rewiring of regulatory networks across cell types, tissues and dynamic biological processes. Our work presents a path toward extracting deeper insights about cell type-specific gene regulation in the rapidly growing compendium of scRNA-seq datasets. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. AVAILABILITY AND IMPLEMENTATION The code for ShareNet is available at http://sharenet.csail.mit.edu and https://github.com/alexw16/sharenet.
Collapse
Affiliation(s)
- Alexander P Wu
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA
| | - Jian Peng
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL 61801, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA.,Department of Mathematics, MIT, Cambridge, MA 02139, USA.,Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Hyunghoon Cho
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
22
|
Ojo BA, VanDussen KL, Rosen MJ. The Promise of Patient-Derived Colon Organoids to Model Ulcerative Colitis. Inflamm Bowel Dis 2021; 28:299-308. [PMID: 34251431 PMCID: PMC8804507 DOI: 10.1093/ibd/izab161] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Indexed: 12/11/2022]
Abstract
Physiologic, molecular, and genetic findings all point to impaired intestinal epithelial function as a key element in the multifactorial pathogenesis of ulcerative colitis (UC). The lack of epithelial-directed therapies is a conspicuous weakness of our UC therapeutic armamentarium. However, a critical barrier to new drug discovery is the lack of preclinical human models of UC. Patient tissue-derived colon epithelial organoids (colonoids) are primary epithelial stem cell-derived in vitro structures capable of self-organization and self-renewal that hold great promise as a human preclinical model for UC drug development. Several single and multi-tissue systems for colonoid culture have been developed, including 3-dimensional colonoids grown in a gelatinous extracellular matrix, 2-dimensional polarized monolayers, and colonoids on a chip that model luminal and blood flow and nutrient delivery. A small number of pioneering studies suggest that colonoids derived from UC patients retain some disease-related transcriptional and epigenetic changes, but they also raise questions regarding the persistence of inflammatory transcriptional programs in culture over time. Additional research is needed to fully characterize the extent to which and under what conditions colonoids accurately model disease-associated epithelial molecular and functional aberrations. With further advancement and standardization of colonoid culture methodology, colonoids will likely become an important tool for realizing precision medicine in UC.
Collapse
Affiliation(s)
- Babajide A Ojo
- Divisions of Gastroenterology, Hepatology, and Nutrition, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States
| | - Kelli L VanDussen
- Divisions of Gastroenterology, Hepatology, and Nutrition, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States,Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States,Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States
| | - Michael J Rosen
- Divisions of Gastroenterology, Hepatology, and Nutrition, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States,Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States,Address correspondence to: Michael J. Rosen, MD, MSCI, Division of Gastroenterology, Hepatology, and Nutrition, Cincinnati Children’s Hospital Medical Center, 3333 Burnet Avenue, MLC 2010, Cincinnati, Ohio, 45229, United States. E-mail:
| |
Collapse
|
23
|
Tripathi RK, Wilkins O. Single cell gene regulatory networks in plants: Opportunities for enhancing climate change stress resilience. PLANT, CELL & ENVIRONMENT 2021; 44:2006-2017. [PMID: 33522607 PMCID: PMC8359182 DOI: 10.1111/pce.14012] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 01/21/2021] [Accepted: 01/22/2021] [Indexed: 05/05/2023]
Abstract
Global warming poses major challenges for plant survival and agricultural productivity. Thus, efforts to enhance stress resilience in plants are key strategies for protecting food security. Gene regulatory networks (GRNs) are a critical mechanism conferring stress resilience. Until recently, predicting GRNs of the individual cells that make up plants and other multicellular organisms was impeded by aggregate population scale measurements of transcriptome and other genome-scale features. With the advancement of high-throughput single cell RNA-seq and other single cell assays, learning GRNs for individual cells is now possible, in principle. In this article, we report on recent advances in experimental and analytical methodologies for single cell sequencing assays especially as they have been applied to the study of plants. We highlight recent advances and ongoing challenges for scGRN prediction, and finally, we highlight the opportunity to use scGRN discovery for studying and ultimately enhancing abiotic stress resilience in plants.
Collapse
Affiliation(s)
- Rajiv K. Tripathi
- Department of Biological SciencesUniversity of ManitobaWinnipegManitobaCanada
| | - Olivia Wilkins
- Department of Biological SciencesUniversity of ManitobaWinnipegManitobaCanada
| |
Collapse
|
24
|
Stein-O'Brien GL, Ainsile MC, Fertig EJ. Forecasting cellular states: from descriptive to predictive biology via single-cell multiomics. CURRENT OPINION IN SYSTEMS BIOLOGY 2021; 26:24-32. [PMID: 34660940 PMCID: PMC8516130 DOI: 10.1016/j.coisb.2021.03.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
As the single cell field races to characterize each cell type, state, and behavior, the complexity of the computational analysis approaches the complexity of the biological systems. Single cell and imaging technologies now enable unprecedented measurements of state transitions in biological systems, providing high-throughput data that capture tens-of-thousands of measurements on hundreds-of-thousands of samples. Thus, the definition of cell type and state is evolving to encompass the broad range of biological questions now attainable. To answer these questions requires the development of computational tools for integrated multi-omics analysis. Merged with mathematical models, these algorithms will be able to forecast future states of biological systems, going from statistical inferences of phenotypes to time course predictions of the biological systems with dynamic maps analogous to weather systems. Thus, systems biology for forecasting biological system dynamics from multi-omic data represents the future of cell biology empowering a new generation of technology-driven predictive medicine.
Collapse
Affiliation(s)
- Genevieve L Stein-O'Brien
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD
- Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD
- Convergence Institute, Johns Hopkins University, Baltimore, MD
| | - Michaela C Ainsile
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD
| | - Elana J Fertig
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD
- Convergence Institute, Johns Hopkins University, Baltimore, MD
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD
- Department of Applied Mathematics & Statistics, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD
| |
Collapse
|
25
|
Duan B, Chen S, Chen X, Zhu C, Tang C, Wang S, Gao Y, Fu S, Liu Q. Integrating multiple references for single-cell assignment. Nucleic Acids Res 2021; 49:e80. [PMID: 34037791 PMCID: PMC8373058 DOI: 10.1093/nar/gkab380] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 04/13/2021] [Accepted: 04/27/2021] [Indexed: 01/09/2023] Open
Abstract
Efficient single-cell assignment is essential for single-cell sequencing data analysis. With the explosive growth of single-cell sequencing data, multiple single-cell sequencing data sources are available for the same kind of tissue, which can be integrated to further improve single-cell assignment; however, an efficient integration strategy is still lacking due to the great challenges of data heterogeneity existing in multiple references. To this end, we present mtSC, a flexible single-cell assignment framework that integrates multiple references based on multitask deep metric learning designed specifically for cell type identification within tissues with multiple single-cell sequencing data as references. We evaluated mtSC on a comprehensive set of publicly available benchmark datasets and demonstrated its state-of-the-art effectiveness for integrative single-cell assignment with multiple references.
Collapse
Affiliation(s)
- Bin Duan
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Shaoqi Chen
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Xiaohan Chen
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Chenyu Zhu
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Chen Tang
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Shuguang Wang
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Yicheng Gao
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Shaliu Fu
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Qi Liu
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| |
Collapse
|
26
|
Gan Y, Xin Y, Hu X, Zou G. Inferring gene regulatory network from single-cell transcriptomic data by integrating multiple prior networks. Comput Biol Chem 2021; 93:107512. [PMID: 34044202 DOI: 10.1016/j.compbiolchem.2021.107512] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2021] [Accepted: 05/12/2021] [Indexed: 11/29/2022]
Abstract
Gene regulatory network models the interactions between transcription factors and target genes. Reconstructing gene regulation network is critically important to understand gene function in a particular cellular context, providing key insights into complex biological systems. We develop a new computational method, named iMPRN, which integrates multiple prior networks to infer regulatory network. Based on the network component analysis model, iMPRN adopts linear regression, graph embedding, and elastic networks to optimize each prior network in line with specific biological context. For each rewired prior networks, iMPRN evaluate the confidence of the regulatory edges in each network based on B scores and finally integrated these optimized networks. We validate the effectiveness of iMPRN by comparing it with four widely-used gene regulatory network reconstruction algorithms on a simulation data set. The results show that iMPRN can infer the gene regulatory network more accurately. Further, on a real scRNA-seq dataset, iMPRN is respectively applied to reconstruct gene regulatory networks for malignant and nonmalignant head and neck tumor cells, demonstrating distinctive differences in their corresponding regulatory networks.
Collapse
Affiliation(s)
- Yanglan Gan
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Yongchang Xin
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Xin Hu
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Guobing Zou
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| |
Collapse
|
27
|
Network Analysis of Gene Transcriptions of Arabidopsis thaliana in Spaceflight Microgravity. Genes (Basel) 2021; 12:genes12030337. [PMID: 33668919 PMCID: PMC7996555 DOI: 10.3390/genes12030337] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2020] [Revised: 02/08/2021] [Accepted: 02/23/2021] [Indexed: 02/06/2023] Open
Abstract
The transcriptomic datasets of the plant model organism Arabidopsis thaliana grown in the International Space Station provided by GeneLab have been mined to isolate the impact of spaceflight microgravity on gene expressions related to root growth. A set of computational tools is used to identify the hub genes that respond differently in spaceflight with controlled lighting compared to on the ground. These computational tools based on graph-theoretic approaches are used to infer gene regulatory networks from the transcriptomic datasets. The three main algorithms used for network analyses are Least Absolute Shrinkage and Selection Operator (LASSO), Pearson correlation, and the Hyperlink-Induced Topic Search (HITS) algorithm. Graph-based spectral analyses reveal distinct properties of the spaceflight microgravity networks for the Wassilewskija (WS), Columbia (Col)-0, and mutant phytochromeD (phyD) ecotypes. The set of hub genes that are significantly altered in spaceflight microgravity are mainly involved in cell wall synthesis, protein transport, response to auxin, stress responses, and catabolic processes. Network analysis highlights five important root growth-regulating hub genes that have the highest outdegree distribution in spaceflight microgravity networks. These concerned genes coding for proteins are identified from the Gene Regulatory Networks (GRNs) corresponding to spaceflight total light environment. Furthermore, network analysis uncovers genes that encode nucleotide-diphospho-sugar interconversion enzymes that have higher transcriptional regulation in spaceflight microgravity and are involved in cell wall biosynthesis.
Collapse
|
28
|
Khan A, Kumar V, Srivastava A, Saxena G, Verma PC. Biomarker-based evaluation of cytogenotoxic potential of glyphosate in Vigna mungo (L.) Hepper genotypes. ENVIRONMENTAL MONITORING AND ASSESSMENT 2021; 193:73. [PMID: 33469782 DOI: 10.1007/s10661-021-08865-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Accepted: 01/11/2021] [Indexed: 06/12/2023]
Abstract
Herbicides have proven to be a boon for agricultural fields. Their inherent property to kill weeds and unwanted vegetation makes them an essential biological tool for farmers and agricultural systems. Besides being capable of destroying weeds, they also exhibit certain effects on non-target crop plants. In the present study, a laboratory experiment was performed to assess the effect of glyphosate on Vigna mungo root meristem cells. Seeds of five different genotypes of V. mungo were treated with a series of concentrations of glyphosate ranging from 1 to 10 mM, and their effects on mitotic cell division were studied. Healthy and uniform-sized seeds were selected and were allowed to grow in Petri plates for 3 days, and all the doses were maintained in triplicates. Roots were fixed at day 3 after treatment (DAT) for cytological microscopic slide preparation. The results obtained indicate the dose-dependent reduction in the mitotic index in all the genotypes and an increase in the percentage of chromosomal aberrations (CAs) and relative abnormality rate (RAR). Most commonly observed chromosome aberrations at lower doses (< 6 mM) were fragments, stickiness, and disoriented metaphase, while at higher doses (6 to 10 mM) bridges, laggards, spindle disorientation, and clumping were obvious. The increase in the percentage of CAs and RAR indicates the inhibitory effect of glyphosate on cell cycle progression at various stages in root tip cells. The present study is a fine example of a biomarker-based genotoxic assessment of mitotic damage caused by glyphosate.
Collapse
Affiliation(s)
- Adiba Khan
- In Vitro Culture and Plant Genetics Unit, Department of Botany, Faculty of Science, University of Lucknow, Lucknow, UP, 226007, India
| | - Vaibhav Kumar
- In Vitro Culture and Plant Genetics Unit, Department of Botany, Faculty of Science, University of Lucknow, Lucknow, UP, 226007, India
| | - Alka Srivastava
- In Vitro Culture and Plant Genetics Unit, Department of Botany, Faculty of Science, University of Lucknow, Lucknow, UP, 226007, India
| | - Gauri Saxena
- In Vitro Culture and Plant Genetics Unit, Department of Botany, Faculty of Science, University of Lucknow, Lucknow, UP, 226007, India.
| | - Praveen C Verma
- Department of Molecular Biology and Biotechnology, CSIR-National Botanical Research Institute, Lucknow, UP, 226001, India
| |
Collapse
|
29
|
Musilova J, Sedlar K. Tools for time-course simulation in systems biology: a brief overview. Brief Bioinform 2021; 22:6076933. [PMID: 33423059 DOI: 10.1093/bib/bbaa392] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 11/27/2020] [Accepted: 11/30/2020] [Indexed: 11/13/2022] Open
Abstract
Dynamic modeling of biological systems is essential for understanding all properties of a given organism as it allows us to look not only at the static picture of an organism but also at its behavior under various conditions. With the increasing amount of experimental data, the number of tools that enable dynamic analysis also grows. However, various tools are based on different approaches, use different types of data and offer different functions for analyses; so it can be difficult to choose the most suitable tool for a selected type of model. Here, we bring a brief overview containing descriptions of 50 tools for the reconstruction of biological models, their time-course simulation and dynamic analysis. We examined each tool using test data and divided them based on the qualitative and quantitative nature of the mathematical apparatus they use.
Collapse
Affiliation(s)
- Jana Musilova
- Department of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of Technology, Brno, Czechia
| | - Karel Sedlar
- Department of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of Technology, Brno, Czechia
| |
Collapse
|
30
|
Hoffmann M, Pachl E, Hartung M, Stiegler V, Baumbach J, Schulz MH, List M. SPONGEdb: a pan-cancer resource for competing endogenous RNA interactions. NAR Cancer 2021; 3:zcaa042. [PMID: 34316695 PMCID: PMC8210024 DOI: 10.1093/narcan/zcaa042] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Revised: 11/12/2020] [Accepted: 12/04/2020] [Indexed: 12/12/2022] Open
Abstract
microRNAs (miRNAs) are post-transcriptional regulators involved in many biological processes and human diseases, including cancer. The majority of transcripts compete over a limited pool of miRNAs, giving rise to a complex network of competing endogenous RNA (ceRNA) interactions. Currently, gene-regulatory networks focus mostly on transcription factor-mediated regulation, and dedicated efforts for charting ceRNA regulatory networks are scarce. Recently, it became possible to infer ceRNA interactions genome-wide from matched gene and miRNA expression data. Here, we inferred ceRNA regulatory networks for 22 cancer types and a pan-cancer ceRNA network based on data from The Cancer Genome Atlas. To make these networks accessible to the biomedical community, we present SPONGEdb, a database offering a user-friendly web interface to browse and visualize ceRNA interactions and an application programming interface accessible by accompanying R and Python packages. SPONGEdb allows researchers to identify potent ceRNA regulators via network centrality measures and to assess their potential as cancer biomarkers through survival, cancer hallmark and gene set enrichment analysis. In summary, SPONGEdb is a feature-rich web resource supporting the community in studying ceRNA regulation within and across cancer types.
Collapse
Affiliation(s)
- Markus Hoffmann
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Elisabeth Pachl
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Michael Hartung
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Veronika Stiegler
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Jan Baumbach
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Marcel H Schulz
- Institute for Cardiovascular Regeneration, Goethe University, 60596 Frankfurt am Main, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| |
Collapse
|
31
|
Hütt MT, Lesne A. Gene Regulatory Networks: Dissecting Structure and Dynamics. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11467-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
|
32
|
Mignone P, Pio G, Džeroski S, Ceci M. Multi-task learning for the simultaneous reconstruction of the human and mouse gene regulatory networks. Sci Rep 2020; 10:22295. [PMID: 33339842 PMCID: PMC7749184 DOI: 10.1038/s41598-020-78033-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Accepted: 10/29/2020] [Indexed: 12/31/2022] Open
Abstract
The reconstruction of Gene Regulatory Networks (GRNs) from gene expression data, supported by machine learning approaches, has received increasing attention in recent years. The task at hand is to identify regulatory links between genes in a network. However, existing methods often suffer when the number of labeled examples is low or when no negative examples are available. In this paper we propose a multi-task method that is able to simultaneously reconstruct the human and the mouse GRNs using the similarities between the two. This is done by exploiting, in a transfer learning approach, possible dependencies that may exist among them. Simultaneously, we solve the issues arising from the limited availability of examples of links by relying on a novel clustering-based approach, able to estimate the degree of certainty of unlabeled examples of links, so that they can be exploited during the training together with the labeled examples. Our experiments show that the proposed method can reconstruct both the human and the mouse GRNs more effectively compared to reconstructing each network separately. Moreover, it significantly outperforms three state-of-the-art transfer learning approaches that, analogously to our method, can exploit the knowledge coming from both organisms. Finally, a specific robustness analysis reveals that, even when the number of labeled examples is very low with respect to the number of unlabeled examples, the proposed method is almost always able to outperform its single-task counterpart.
Collapse
Affiliation(s)
- Paolo Mignone
- Department of Computer Science, University of Bari Aldo Moro, Bari, 70125, Italy
| | - Gianvito Pio
- Department of Computer Science, University of Bari Aldo Moro, Bari, 70125, Italy.
| | - Sašo Džeroski
- Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, 1000, Slovenia
| | - Michelangelo Ceci
- Department of Computer Science, University of Bari Aldo Moro, Bari, 70125, Italy.,Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, 1000, Slovenia
| |
Collapse
|
33
|
Kwon MS, Lee BT, Lee SY, Kim HU. Modeling regulatory networks using machine learning for systems metabolic engineering. Curr Opin Biotechnol 2020; 65:163-170. [DOI: 10.1016/j.copbio.2020.02.014] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 02/23/2020] [Accepted: 02/26/2020] [Indexed: 12/18/2022]
|
34
|
Baur B, Shin J, Zhang S, Roy S. Data integration for inferring context-specific gene regulatory networks. CURRENT OPINION IN SYSTEMS BIOLOGY 2020; 23:38-46. [PMID: 33225112 PMCID: PMC7676633 DOI: 10.1016/j.coisb.2020.09.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Transcriptional regulatory networks control context-specific gene expression patterns and play important roles in normal and disease processes. Advances in genomics are rapidly increasing our ability to measure different components of the regulation machinery at the single-cell and bulk population level. An important challenge is to combine different types of regulatory genomic measurements to construct a more complete picture of gene regulatory networks across different disease, environmental, and developmental contexts. In this review, we focus on recent computational methods that integrate regulatory genomic data sets to infer context specificity and dynamics in regulatory networks.
Collapse
Affiliation(s)
- Brittany Baur
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, 53715, USA
| | - Junha Shin
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, 53715, USA
| | - Shilu Zhang
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, 53715, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, 53715, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53715, USA
| |
Collapse
|
35
|
Wu N, Yin F, Ou-Yang L, Zhu Z, Xie W. Joint learning of multiple gene networks from single-cell gene expression data. Comput Struct Biotechnol J 2020; 18:2583-2595. [PMID: 33033579 PMCID: PMC7527714 DOI: 10.1016/j.csbj.2020.09.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 08/31/2020] [Accepted: 09/01/2020] [Indexed: 11/24/2022] Open
Abstract
Inferring gene networks from gene expression data is important for understanding functional organizations within cells. With the accumulation of single-cell RNA sequencing (scRNA-seq) data, it is possible to infer gene networks at single cell level. However, due to the characteristics of scRNA-seq data, such as cellular heterogeneity and high sparsity caused by dropout events, traditional network inference methods may not be suitable for scRNA-seq data. In this study, we introduce a novel joint Gaussian copula graphical model (JGCGM) to jointly estimate multiple gene networks for multiple cell subgroups from scRNA-seq data. Our model can deal with non-Gaussian data with missing values, and identify the common and unique network structures of multiple cell subgroups, which is suitable for scRNA-seq data. Extensive experiments on synthetic data demonstrate that our proposed model outperforms other compared state-of-the-art network inference models. We apply our model to real scRNA-seq data sets to infer gene networks of different cell subgroups. Hub genes in the estimated gene networks are found to be biological significance.
Collapse
Affiliation(s)
- Nuosi Wu
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Fu Yin
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Le Ou-Yang
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), Shenzhen University, Shenzhen, China
- Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen, China
| | - Zexuan Zhu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Weixin Xie
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| |
Collapse
|
36
|
A 1p/19q Codeletion-Associated Immune Signature for Predicting Lower Grade Glioma Prognosis. Cell Mol Neurobiol 2020; 42:709-722. [DOI: 10.1007/s10571-020-00959-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2020] [Accepted: 08/30/2020] [Indexed: 12/19/2022]
|
37
|
Perturbation-based gene regulatory network inference to unravel oncogenic mechanisms. Sci Rep 2020; 10:14149. [PMID: 32843692 PMCID: PMC7447758 DOI: 10.1038/s41598-020-70941-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2019] [Accepted: 07/22/2020] [Indexed: 01/01/2023] Open
Abstract
The gene regulatory network (GRN) of human cells encodes mechanisms to ensure proper functioning. However, if this GRN is dysregulated, the cell may enter into a disease state such as cancer. Understanding the GRN as a system can therefore help identify novel mechanisms underlying disease, which can lead to new therapies. To deduce regulatory interactions relevant to cancer, we applied a recent computational inference framework to data from perturbation experiments in squamous carcinoma cell line A431. GRNs were inferred using several methods, and the false discovery rate was controlled by the NestBoot framework. We developed a novel approach to assess the predictiveness of inferred GRNs against validation data, despite the lack of a gold standard. The best GRN was significantly more predictive than the null model, both in cross-validated benchmarks and for an independent dataset of the same genes under a different perturbation design. The inferred GRN captures many known regulatory interactions central to cancer-relevant processes in addition to predicting many novel interactions, some of which were experimentally validated, thus providing mechanistic insights that are useful for future cancer research.
Collapse
|
38
|
Turki T, Taguchi YH. SCGRNs: Novel supervised inference of single-cell gene regulatory networks of complex diseases. Comput Biol Med 2020; 118:103656. [PMID: 32174324 DOI: 10.1016/j.compbiomed.2020.103656] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Revised: 02/06/2020] [Accepted: 02/07/2020] [Indexed: 12/19/2022]
|
39
|
Watson A, Habib M, Bapteste E. Phylosystemics: Merging Phylogenomics, Systems Biology, and Ecology to Study Evolution. Trends Microbiol 2020; 28:176-190. [DOI: 10.1016/j.tim.2019.10.011] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2019] [Revised: 10/21/2019] [Accepted: 10/22/2019] [Indexed: 11/28/2022]
|
40
|
Kang Y, Patel NR, Shively C, Recio PS, Chen X, Wranik BJ, Kim G, McIsaac RS, Mitra R, Brent MR. Dual threshold optimization and network inference reveal convergent evidence from TF binding locations and TF perturbation responses. Genome Res 2020; 30:459-471. [PMID: 32060051 PMCID: PMC7111528 DOI: 10.1101/gr.259655.119] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Accepted: 02/11/2020] [Indexed: 12/22/2022]
Abstract
A high-confidence map of the direct, functional targets of each transcription factor (TF) requires convergent evidence from independent sources. Two significant sources of evidence are TF binding locations and the transcriptional responses to direct TF perturbations. Systematic data sets of both types exist for yeast and human, but they rarely converge on a common set of direct, functional targets for a TF. Even the few genes that are both bound and responsive may not be direct functional targets. Our analysis shows that when there are many nonfunctional binding sites and many indirect targets, nonfunctional sites are expected to occur in the cis-regulatory DNA of indirect targets by chance. To address this problem, we introduce dual threshold optimization (DTO), a new method for setting significance thresholds on binding and perturbation-response data, and show that it improves convergence. It also enables comparison of binding data to perturbation-response data that have been processed by network inference algorithms, which further improves convergence. The combination of dual threshold optimization and network inference greatly expands the high-confidence TF network map in both yeast and human. Next, we analyze a comprehensive new data set measuring the transcriptional response shortly after inducing overexpression of a yeast TF. We also present a new yeast binding location data set obtained by transposon calling cards and compare it to recent ChIP-exo data. These new data sets improve convergence and expand the high-confidence network synergistically.
Collapse
Affiliation(s)
- Yiming Kang
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, Missouri 63130, USA
| | - Nikhil R Patel
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, Missouri 63130, USA
| | - Christian Shively
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Pamela Samantha Recio
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Xuhua Chen
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Bernd J Wranik
- Calico Life Sciences LLC, South San Francisco, California 94080, USA
| | - Griffin Kim
- Calico Life Sciences LLC, South San Francisco, California 94080, USA
| | - R Scott McIsaac
- Calico Life Sciences LLC, South San Francisco, California 94080, USA
| | - Robi Mitra
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Michael R Brent
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, Missouri 63130, USA
| |
Collapse
|
41
|
Jackson CA, Castro DM, Saldi GA, Bonneau R, Gresham D. Gene regulatory network reconstruction using single-cell RNA sequencing of barcoded genotypes in diverse environments. eLife 2020; 9:e51254. [PMID: 31985403 PMCID: PMC7004572 DOI: 10.7554/elife.51254] [Citation(s) in RCA: 87] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Accepted: 01/10/2020] [Indexed: 11/13/2022] Open
Abstract
Understanding how gene expression programs are controlled requires identifying regulatory relationships between transcription factors and target genes. Gene regulatory networks are typically constructed from gene expression data acquired following genetic perturbation or environmental stimulus. Single-cell RNA sequencing (scRNAseq) captures the gene expression state of thousands of individual cells in a single experiment, offering advantages in combinatorial experimental design, large numbers of independent measurements, and accessing the interaction between the cell cycle and environmental responses that is hidden by population-level analysis of gene expression. To leverage these advantages, we developed a method for scRNAseq in budding yeast (Saccharomyces cerevisiae). We pooled diverse transcriptionally barcoded gene deletion mutants in 11 different environmental conditions and determined their expression state by sequencing 38,285 individual cells. We benchmarked a framework for learning gene regulatory networks from scRNAseq data that incorporates multitask learning and constructed a global gene regulatory network comprising 12,228 interactions.
Collapse
Affiliation(s)
- Christopher A Jackson
- Center For Genomics and Systems BiologyNew York UniversityNew YorkUnited States
- Department of BiologyNew York UniversityNew YorkUnited States
| | | | | | - Richard Bonneau
- Center For Genomics and Systems BiologyNew York UniversityNew YorkUnited States
- Department of BiologyNew York UniversityNew YorkUnited States
- Courant Institute of Mathematical Sciences, Computer Science DepartmentNew York UniversityNew YorkUnited States
- Center For Data ScienceNew York UniversityNew YorkUnited States
- Flatiron Institute, Center for Computational BiologySimons FoundationNew YorkUnited States
| | - David Gresham
- Center For Genomics and Systems BiologyNew York UniversityNew YorkUnited States
- Department of BiologyNew York UniversityNew YorkUnited States
| |
Collapse
|
42
|
Law SR, Kellgren TG, Björk R, Ryden P, Keech O. Centralization Within Sub-Experiments Enhances the Biological Relevance of Gene Co-expression Networks: A Plant Mitochondrial Case Study. FRONTIERS IN PLANT SCIENCE 2020; 11:524. [PMID: 32582224 PMCID: PMC7287149 DOI: 10.3389/fpls.2020.00524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Accepted: 04/07/2020] [Indexed: 05/07/2023]
Abstract
UNLABELLED Gene co-expression networks (GCNs) can be prepared using a variety of mathematical approaches based on data sampled across diverse developmental processes, tissue types, pathologies, mutant backgrounds, and stress conditions. These networks are used to identify genes with similar expression dynamics but are prone to introducing false-positive and false-negative relationships, especially in the instance of large and heterogenous datasets. With the aim of optimizing the relevance of edges in GCNs and enhancing global biological insight, we propose a novel approach that involves a data-centering step performed simultaneously per gene and per sub-experiment, called centralization within sub-experiments (CSE). Using a gene set encoding the plant mitochondrial proteome as a case study, our results show that all CSE-based GCNs assessed had significantly more edges within the majority of the considered functional sub-networks, such as the mitochondrial electron transport chain and its complexes, than GCNs not using CSE; thus demonstrating that CSE-based GCNs are efficient at predicting canonical functions and associated pathways, here referred to as the core gene network. Furthermore, we show that correlation analyses using CSE-processed data can be used to fine-tune prediction of the function of uncharacterized genes; while its use in combination with analyses based on non-CSE data can augment conventional stress analyses with the innate connections underpinning the dynamic system being examined. Therefore, CSE is an effective alternative method to conventional batch correction approaches, particularly when dealing with large and heterogenous datasets. The method is easy to implement into a pre-existing GCN analysis pipeline and can provide enhanced biological relevance to conventional GCNs by allowing users to delineate a core gene network. AUTHOR SUMMARY Gene co-expression networks (GCNs) are the product of a variety of mathematical approaches that identify causal relationships in gene expression dynamics but are prone to the misdiagnoses of false-positives and false-negatives, especially in the instance of large and heterogenous datasets. In light of the burgeoning output of next-generation sequencing projects performed on a variety of species, and developmental or clinical conditions; the statistical power and complexity of these networks will undoubtedly increase, while their biological relevance will be fiercely challenged. Here, we propose a novel approach to generate a "core" GCN with enhanced biological relevance. Our method involves a data-centering step that effectively removes all primary treatment/tissue effects, which is simple to employ and can be easily implemented into pre-existing GCN analysis pipelines. The gain in biological relevance resulting from the adoption of this approach was assessed using a plant mitochondrial case study.
Collapse
Affiliation(s)
- Simon R. Law
- Department of Plant Physiology, Umeå Plant Science Centre, Umeå Universitet, Umeå, Sweden
| | - Therese G. Kellgren
- Department of Mathematics and Mathematical Statistics, Umeå Universitet, Umeå, Sweden
| | - Rafael Björk
- Department of Mathematics and Mathematical Statistics, Umeå Universitet, Umeå, Sweden
| | - Patrik Ryden
- Department of Mathematics and Mathematical Statistics, Umeå Universitet, Umeå, Sweden
- *Correspondence: Patrik Ryden,
| | - Olivier Keech
- Department of Plant Physiology, Umeå Plant Science Centre, Umeå Universitet, Umeå, Sweden
- Olivier Keech,
| |
Collapse
|
43
|
Blencowe M, Arneson D, Ding J, Chen YW, Saleem Z, Yang X. Network modeling of single-cell omics data: challenges, opportunities, and progresses. Emerg Top Life Sci 2019; 3:379-398. [PMID: 32270049 PMCID: PMC7141415 DOI: 10.1042/etls20180176] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 06/07/2019] [Accepted: 06/24/2019] [Indexed: 01/07/2023]
Abstract
Single-cell multi-omics technologies are rapidly evolving, prompting both methodological advances and biological discoveries at an unprecedented speed. Gene regulatory network modeling has been used as a powerful approach to elucidate the complex molecular interactions underlying biological processes and systems, yet its application in single-cell omics data modeling has been met with unique challenges and opportunities. In this review, we discuss these challenges and opportunities, and offer an overview of the recent development of network modeling approaches designed to capture dynamic networks, within-cell networks, and cell-cell interaction or communication networks. Finally, we outline the remaining gaps in single-cell gene network modeling and the outlooks of the field moving forward.
Collapse
Affiliation(s)
- Montgomery Blencowe
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
| | - Douglas Arneson
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
| | - Jessica Ding
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
| | - Yen-Wei Chen
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
- Molecular Toxicology Program, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
| | - Zara Saleem
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
| | - Xia Yang
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
- Molecular Toxicology Program, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
- Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
| |
Collapse
|
44
|
Chasman D, Iyer N, Fotuhi Siahpirani A, Estevez Silva M, Lippmann E, McIntosh B, Probasco MD, Jiang P, Stewart R, Thomson JA, Ashton RS, Roy S. Inferring Regulatory Programs Governing Region Specificity of Neuroepithelial Stem Cells during Early Hindbrain and Spinal Cord Development. Cell Syst 2019; 9:167-186.e12. [PMID: 31302154 DOI: 10.1016/j.cels.2019.05.012] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2018] [Revised: 05/05/2019] [Accepted: 05/30/2019] [Indexed: 12/19/2022]
Abstract
Neuroepithelial stem cells (NSC) from different anatomical regions of the embryonic neural tube's rostrocaudal axis can differentiate into diverse central nervous system tissues, but the transcriptional regulatory networks governing these processes are incompletely understood. Here, we measure region-specific NSC gene expression along the rostrocaudal axis in a human pluripotent stem cell model of early central nervous system development over a 72-h time course, spanning the hindbrain to cervical spinal cord. We introduce Escarole, a probabilistic clustering algorithm for non-stationary time series, and combine it with prior-based regulatory network inference to identify genes that are regulated dynamically and predict their upstream regulators. We identify known regulators of patterning and neural development, including the HOX genes, and predict a direct regulatory connection between the transcription factor POU3F2 and target gene STMN2. We demonstrate that POU3F2 is required for expression of STMN2, suggesting that this regulatory connection is important for region specificity of NSCs.
Collapse
Affiliation(s)
- Deborah Chasman
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
| | - Nisha Iyer
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA; Department of Biomedical Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Alireza Fotuhi Siahpirani
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA; Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Maria Estevez Silva
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA; Department of Biomedical Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Ethan Lippmann
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA; Department of Biomedical Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Brian McIntosh
- Regenerative Biology Theme, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Mitchell D Probasco
- Regenerative Biology Theme, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Peng Jiang
- Regenerative Biology Theme, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Ron Stewart
- Regenerative Biology Theme, Morgridge Institute for Research, Madison, WI 53715, USA
| | - James A Thomson
- Regenerative Biology Theme, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Randolph S Ashton
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA; Department of Biomedical Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA.
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA; Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53792, USA.
| |
Collapse
|
45
|
Miraldi ER, Pokrovskii M, Watters A, Castro DM, De Veaux N, Hall JA, Lee JY, Ciofani M, Madar A, Carriero N, Littman DR, Bonneau R. Leveraging chromatin accessibility for transcriptional regulatory network inference in T Helper 17 Cells. Genome Res 2019; 29:449-463. [PMID: 30696696 PMCID: PMC6396413 DOI: 10.1101/gr.238253.118] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2018] [Accepted: 01/15/2019] [Indexed: 12/13/2022]
Abstract
Transcriptional regulatory networks (TRNs) provide insight into cellular behavior by describing interactions between transcription factors (TFs) and their gene targets. The assay for transposase-accessible chromatin (ATAC)–seq, coupled with TF motif analysis, provides indirect evidence of chromatin binding for hundreds of TFs genome-wide. Here, we propose methods for TRN inference in a mammalian setting, using ATAC-seq data to improve gene expression modeling. We test our methods in the context of T Helper Cell Type 17 (Th17) differentiation, generating new ATAC-seq data to complement existing Th17 genomic resources. In this resource-rich mammalian setting, our extensive benchmarking provides quantitative, genome-scale evaluation of TRN inference, combining ATAC-seq and RNA-seq data. We refine and extend our previous Th17 TRN, using our new TRN inference methods to integrate all Th17 data (gene expression, ATAC-seq, TF knockouts, and ChIP-seq). We highlight newly discovered roles for individual TFs and groups of TFs (“TF–TF modules”) in Th17 gene regulation. Given the popularity of ATAC-seq, which provides high-resolution with low sample input requirements, we anticipate that our methods will improve TRN inference in new mammalian systems, especially in vivo, for cells directly from humans and animal models.
Collapse
Affiliation(s)
- Emily R Miraldi
- Divisions of Immunobiology and Biomedical Informatics, Cincinnati Children's Hospital, Cincinnati, Ohio 45229, USA.,Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio 45257, USA
| | - Maria Pokrovskii
- Molecular Pathogenesis Program, The Kimmel Center for Biology and Medicine of the Skirball Institute, New York, New York 10016, USA
| | - Aaron Watters
- Center for Computational Biology, Flatiron Institute, New York, New York 10010, USA
| | - Dayanne M Castro
- Department of Biology, New York University, New York, New York 10012, USA
| | - Nicholas De Veaux
- Center for Computational Biology, Flatiron Institute, New York, New York 10010, USA
| | - Jason A Hall
- Molecular Pathogenesis Program, The Kimmel Center for Biology and Medicine of the Skirball Institute, New York, New York 10016, USA
| | - June-Yong Lee
- Molecular Pathogenesis Program, The Kimmel Center for Biology and Medicine of the Skirball Institute, New York, New York 10016, USA
| | - Maria Ciofani
- Department of Immunology, Duke University School of Medicine, Durham, North Carolina 27710, USA
| | - Aviv Madar
- Department of Biology, New York University, New York, New York 10012, USA
| | - Nick Carriero
- Center for Computational Biology, Flatiron Institute, New York, New York 10010, USA
| | - Dan R Littman
- Molecular Pathogenesis Program, The Kimmel Center for Biology and Medicine of the Skirball Institute, New York, New York 10016, USA.,The Howard Hughes Medical Institute
| | - Richard Bonneau
- Center for Computational Biology, Flatiron Institute, New York, New York 10010, USA.,Department of Biology, New York University, New York, New York 10012, USA.,Center for Data Science, New York University, New York, New York 10010, USA
| |
Collapse
|