1
|
Pan C, Chen Y. Informeasure: an R/bioconductor package for quantifying nonlinear dependence between variables in biological networks from an information theory perspective. BMC Bioinformatics 2024; 25:382. [PMID: 39695935 DOI: 10.1186/s12859-024-05996-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2024] [Accepted: 11/21/2024] [Indexed: 12/20/2024] Open
Abstract
BACKGROUND Using information measures to infer biological regulatory networks can capture nonlinear relationships between variables. However, it is computationally challenging, and there is a lack of convenient tools. RESULTS We introduce Informeasure, an R package designed to quantify nonlinear dependencies in biological regulatory networks from an information theory perspective. This package compiles a comprehensive set of information measurements, including mutual information, conditional mutual information, interaction information, partial information decomposition, and part mutual information. Mutual information is used for bivariate network inference, while the other four estimators are dedicated to trivariate network analysis. CONCLUSIONS Informeasure is a turnkey solution, allowing users to utilize these information measures immediately upon installation. Informeasure is available as an R/Bioconductor package at https://bioconductor.org/packages/Informeasure .
Collapse
Affiliation(s)
- Chu Pan
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, China.
| | - Yanlin Chen
- School of Software, Henan University of Engineering, Zhengzhou, Henan, China
| |
Collapse
|
2
|
Barman S, Farid FA, Gope HL, Hafiz MFB, Khan NA, Ahmad S, Mansor S. LBF-MI: Limited Boolean Functions and Mutual Information to Infer a Gene Regulatory Network from Time-Series Gene Expression Data. Genes (Basel) 2024; 15:1530. [PMID: 39766797 PMCID: PMC11675687 DOI: 10.3390/genes15121530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Revised: 11/23/2024] [Accepted: 11/26/2024] [Indexed: 01/11/2025] Open
Abstract
BACKGROUND In the realm of system biology, it is a challenging endeavor to infer a gene regulatory network from time-series gene expression data. Numerous Boolean network inference techniques have emerged for reconstructing a gene regulatory network from a time-series gene expression dataset. However, most of these techniques pose scalability concerns given their capability to consider only two to three regulatory genes over a specific target gene. METHODS To overcome this limitation, a novel inference method, LBF-MI, has been proposed in this research. This two-phase method utilizes limited Boolean functions and multivariate mutual information to reconstruct a Boolean gene regulatory network from time-series gene expression data. Initially, Boolean functions are applied to determine the optimum solutions. In case of failure, multivariate mutual information is applied to obtain the optimum solutions. RESULTS This research conducted a performance-comparison experiment between LBF-MI and three other methods: mutual information-based Boolean network inference, context likelihood relatedness, and relevance network. When examined on artificial as well as real-time-series gene expression data, the outcomes exhibited that the proposed LBF-MI method outperformed mutual information-based Boolean network inference, context likelihood relatedness, and relevance network on artificial datasets, and two real Escherichia coli datasets (E. coli gene regulatory network, and SOS response of E. coli regulatory network). CONCLUSIONS LBF-MI's superior performance in gene regulatory network inference enables researchers to uncover the regulatory mechanisms and cellular behaviors of various organisms.
Collapse
Affiliation(s)
- Shohag Barman
- Department of Computer Science and Engineering, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Pirojpur 8500, Bangladesh
| | - Fahmid Al Farid
- Faculty of Engineering, Multimedia University, Cyberjaya 63000, Selangor, Malaysia;
| | - Hira Lal Gope
- Faculty of Agricultural Engineering and Technology, Sylhet Agricultural University, Sylhet 3100, Bangladesh;
| | - Md. Ferdous Bin Hafiz
- Department of Computer Science and Engineering, Southeast University, Dhaka 1208, Bangladesh;
| | - Niaz Ashraf Khan
- Department of Computer Science and Engineering, University of Liberal Arts Bangladesh, Dhaka 1207, Bangladesh;
| | - Sabbir Ahmad
- Department of Computer Science and Engineering, University of Chittagong, Chittagong 4331, Bangladesh;
| | - Sarina Mansor
- Faculty of Engineering, Multimedia University, Cyberjaya 63000, Selangor, Malaysia;
| |
Collapse
|
3
|
Zhou X, Pan J, Chen L, Zhang S, Chen Y. DeepIMAGER: Deeply Analyzing Gene Regulatory Networks from scRNA-seq Data. Biomolecules 2024; 14:766. [PMID: 39062480 PMCID: PMC11274664 DOI: 10.3390/biom14070766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 06/22/2024] [Accepted: 06/25/2024] [Indexed: 07/28/2024] Open
Abstract
Understanding the dynamics of gene regulatory networks (GRNs) across diverse cell types poses a challenge yet holds immense value in unraveling the molecular mechanisms governing cellular processes. Current computational methods, which rely solely on expression changes from bulk RNA-seq and/or scRNA-seq data, often result in high rates of false positives and low precision. Here, we introduce an advanced computational tool, DeepIMAGER, for inferring cell-specific GRNs through deep learning and data integration. DeepIMAGER employs a supervised approach that transforms the co-expression patterns of gene pairs into image-like representations and leverages transcription factor (TF) binding information for model training. It is trained using comprehensive datasets that encompass scRNA-seq profiles and ChIP-seq data, capturing TF-gene pair information across various cell types. Comprehensive validations on six cell lines show DeepIMAGER exhibits superior performance in ten popular GRN inference tools and has remarkable robustness against dropout-zero events. DeepIMAGER was applied to scRNA-seq datasets of multiple myeloma (MM) and detected potential GRNs for TFs of RORC, MITF, and FOXD2 in MM dendritic cells. This technical innovation, combined with its capability to accurately decode GRNs from scRNA-seq, establishes DeepIMAGER as a valuable tool for unraveling complex regulatory networks in various cell types.
Collapse
Affiliation(s)
- Xiguo Zhou
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Jingyi Pan
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Liang Chen
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Yong Chen
- Department of Biological and Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA
| |
Collapse
|
4
|
Wu S, Jin K, Tang M, Xia Y, Gao W. Inference of Gene Regulatory Networks Based on Multi-view Hierarchical Hypergraphs. Interdiscip Sci 2024; 16:318-332. [PMID: 38342857 DOI: 10.1007/s12539-024-00604-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 11/26/2023] [Accepted: 01/03/2024] [Indexed: 02/13/2024]
Abstract
Since gene regulation is a complex process in which multiple genes act simultaneously, accurately inferring gene regulatory networks (GRNs) is a long-standing challenge in systems biology. Although graph neural networks can formally describe intricate gene expression mechanisms, current GRN inference methods based on graph learning regard only transcription factor (TF)-target gene interactions as pairwise relationships, and cannot model the many-to-many high-order regulatory patterns that prevail among genes. Moreover, these methods often rely on limited prior regulatory knowledge, ignoring the structural information of GRNs in gene expression profiles. Therefore, we propose a multi-view hierarchical hypergraphs GRN (MHHGRN) inference model. Specifically, multiple heterogeneous biological information is integrated to construct multi-view hierarchical hypergraphs of TFs and target genes, using hypergraph convolution networks to model higher order complex regulatory relationships. Meanwhile, the coupled information diffusion mechanism and the cross-domain messaging mechanism facilitate the information sharing between genes to optimise gene embedding representations. Finally, a unique channel attention mechanism is used to adaptively learn feature representations from multiple views for GRN inference. Experimental results show that MHHGRN achieves better results than the baseline methods on the E. coli and S. cerevisiae benchmark datasets of the DREAM5 challenge, and it has excellent cross-species generalization, achieving comparable or better performance on scRNA-seq datasets from five mouse and two human cell lines.
Collapse
Affiliation(s)
- Songyang Wu
- School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China
| | - Kui Jin
- School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China
| | - Mingjing Tang
- School of Life Science, Yunnan Normal University, Kunming, 650500, China.
- Engineering Research Center of Sustainable Development and Utilization of Biomass Energy, Ministry of Education, Yunnan Normal University, Kunming, 650500, China.
| | - Yuelong Xia
- School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China
| | - Wei Gao
- School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China
| |
Collapse
|
5
|
Wang Y, Liu M, Jafari M, Tang J. A critical assessment of Traditional Chinese Medicine databases as a source for drug discovery. Front Pharmacol 2024; 15:1303693. [PMID: 38738181 PMCID: PMC11082401 DOI: 10.3389/fphar.2024.1303693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Accepted: 04/15/2024] [Indexed: 05/14/2024] Open
Abstract
Traditional Chinese Medicine (TCM) has been used for thousands of years to treat human diseases. Recently, many databases have been devoted to studying TCM pharmacology. Most of these databases include information about the active ingredients of TCM herbs and their disease indications. These databases enable researchers to interrogate the mechanisms of action of TCM systematically. However, there is a need for comparative studies of these databases, as they are derived from various resources with different data processing methods. In this review, we provide a comprehensive analysis of the existing TCM databases. We found that the information complements each other by comparing herbs, ingredients, and herb-ingredient pairs in these databases. Therefore, data harmonization is vital to use all the available information fully. Moreover, different TCM databases may contain various annotation types for herbs or ingredients, notably for the chemical structure of ingredients, making it challenging to integrate data from them. We also highlight the latest TCM databases on symptoms or gene expressions, suggesting that using multi-omics data and advanced bioinformatics approaches may provide new insights for drug discovery in TCM. In summary, such a comparative study would help improve the understanding of data complexity that may ultimately motivate more efficient and more standardized strategies towards the digitalization of TCM.
Collapse
Affiliation(s)
- Yinyin Wang
- School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Minxia Liu
- Faculty of Life Science, Anhui Medical University, Hefei, China
| | - Mohieddin Jafari
- Department Biochemistry and Developmental Biology, University of Helsinki, Helsinki, Finland
| | - Jing Tang
- Department Biochemistry and Developmental Biology, University of Helsinki, Helsinki, Finland
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| |
Collapse
|
6
|
Mousavi R, Lobo D. Automatic design of gene regulatory mechanisms for spatial pattern formation. NPJ Syst Biol Appl 2024; 10:35. [PMID: 38565850 PMCID: PMC10987498 DOI: 10.1038/s41540-024-00361-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 03/19/2024] [Indexed: 04/04/2024] Open
Abstract
Gene regulatory mechanisms (GRMs) control the formation of spatial and temporal expression patterns that can serve as regulatory signals for the development of complex shapes. Synthetic developmental biology aims to engineer such genetic circuits for understanding and producing desired multicellular spatial patterns. However, designing synthetic GRMs for complex, multi-dimensional spatial patterns is a current challenge due to the nonlinear interactions and feedback loops in genetic circuits. Here we present a methodology to automatically design GRMs that can produce any given two-dimensional spatial pattern. The proposed approach uses two orthogonal morphogen gradients acting as positional information signals in a multicellular tissue area or culture, which constitutes a continuous field of engineered cells implementing the same designed GRM. To efficiently design both the circuit network and the interaction mechanisms-including the number of genes necessary for the formation of the target spatial pattern-we developed an automated algorithm based on high-performance evolutionary computation. The tolerance of the algorithm can be configured to design GRMs that are either simple to produce approximate patterns or complex to produce precise patterns. We demonstrate the approach by automatically designing GRMs that can produce a diverse set of synthetic spatial expression patterns by interpreting just two orthogonal morphogen gradients. The proposed framework offers a versatile approach to systematically design and discover complex genetic circuits producing spatial patterns.
Collapse
Affiliation(s)
- Reza Mousavi
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, MD, USA
| | - Daniel Lobo
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, MD, USA.
- Greenebaum Comprehensive Cancer Center and Center for Stem Cell Biology & Regenerative Medicine, University of Maryland, Baltimore, Baltimore, MD, USA.
| |
Collapse
|
7
|
Mousavi R, Lobo D. Automatic design of gene regulatory mechanisms for spatial pattern formation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.26.550573. [PMID: 37546866 PMCID: PMC10402059 DOI: 10.1101/2023.07.26.550573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Synthetic developmental biology aims to engineer gene regulatory mechanisms (GRMs) for understanding and producing desired multicellular patterns and shapes. However, designing GRMs for spatial patterns is a current challenge due to the nonlinear interactions and feedback loops in genetic circuits. Here we present a methodology to automatically design GRMs that can produce any given spatial pattern. The proposed approach uses two orthogonal morphogen gradients acting as positional information signals in a multicellular tissue area or culture, which constitutes a continuous field of engineered cells implementing the same designed GRM. To efficiently design both the circuit network and the interaction mechanisms-including the number of genes necessary for the formation of the target pattern-we developed an automated algorithm based on high-performance evolutionary computation. The tolerance of the algorithm can be configured to design GRMs that are either simple to produce approximate patterns or complex to produce precise patterns. We demonstrate the approach by automatically designing GRMs that can produce a diverse set of synthetic spatial expression patterns by interpreting just two orthogonal morphogen gradients. The proposed framework offers a versatile approach to systematically design and discover pattern-producing genetic circuits.
Collapse
Affiliation(s)
- Reza Mousavi
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
| | - Daniel Lobo
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
- Greenebaum Comprehensive Cancer Center and Center for Stem Cell Biology & Regenerative Medicine, University of Maryland, School of Medicine, 22 S. Greene Street, Baltimore, MD 21201, USA
| |
Collapse
|
8
|
Marku M, Pancaldi V. From time-series transcriptomics to gene regulatory networks: A review on inference methods. PLoS Comput Biol 2023; 19:e1011254. [PMID: 37561790 PMCID: PMC10414591 DOI: 10.1371/journal.pcbi.1011254] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023] Open
Abstract
Inference of gene regulatory networks has been an active area of research for around 20 years, leading to the development of sophisticated inference algorithms based on a variety of assumptions and approaches. With the ever increasing demand for more accurate and powerful models, the inference problem remains of broad scientific interest. The abstract representation of biological systems through gene regulatory networks represents a powerful method to study such systems, encoding different amounts and types of information. In this review, we summarize the different types of inference algorithms specifically based on time-series transcriptomics, giving an overview of the main applications of gene regulatory networks in computational biology. This review is intended to give an updated reference of regulatory networks inference tools to biologists and researchers new to the topic and guide them in selecting the appropriate inference method that best fits their questions, aims, and experimental data.
Collapse
Affiliation(s)
- Malvina Marku
- CRCT, Université de Toulouse, Inserm, CNRS, Université Toulouse III-Paul Sabatier, Centre de Recherches en Cancérologie de Toulouse, Toulouse, France
| | - Vera Pancaldi
- CRCT, Université de Toulouse, Inserm, CNRS, Université Toulouse III-Paul Sabatier, Centre de Recherches en Cancérologie de Toulouse, Toulouse, France
- Barcelona Supercomputing Center, Barcelona, Spain
| |
Collapse
|
9
|
Mbebi AJ, Nikoloski Z. Gene regulatory network inference using mixed-norms regularized multivariate model with covariance selection. PLoS Comput Biol 2023; 19:e1010832. [PMID: 37523414 PMCID: PMC10414675 DOI: 10.1371/journal.pcbi.1010832] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 08/10/2023] [Accepted: 07/11/2023] [Indexed: 08/02/2023] Open
Abstract
Despite extensive research efforts, reconstruction of gene regulatory networks (GRNs) from transcriptomics data remains a pressing challenge in systems biology. While non-linear approaches for reconstruction of GRNs show improved performance over simpler alternatives, we do not yet have understanding if joint modelling of multiple target genes may improve performance, even under linearity assumptions. To address this problem, we propose two novel approaches that cast the GRN reconstruction problem as a blend between regularized multivariate regression and graphical models that combine the L2,1-norm with classical regularization techniques. We used data and networks from the DREAM5 challenge to show that the proposed models provide consistently good performance in comparison to contenders whose performance varies with data sets from simulation and experiments from model unicellular organisms Escherichia coli and Saccharomyces cerevisiae. Since the models' formulation facilitates the prediction of master regulators, we also used the resulting findings to identify master regulators over all data sets as well as their plasticity across different environments. Our results demonstrate that the identified master regulators are in line with experimental evidence from the model bacterium E. coli. Together, our study demonstrates that simultaneous modelling of several target genes results in improved inference of GRNs and can be used as an alternative in different applications.
Collapse
Affiliation(s)
- Alain J. Mbebi
- Bioinformatics Department, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, Germany
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, Germany
| | - Zoran Nikoloski
- Bioinformatics Department, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, Germany
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, Germany
| |
Collapse
|
10
|
Shachaf LI, Roberts E, Cahan P, Xiao J. Gene regulation network inference using k-nearest neighbor-based mutual information estimation: revisiting an old DREAM. BMC Bioinformatics 2023; 24:84. [PMID: 36879188 PMCID: PMC9990267 DOI: 10.1186/s12859-022-05047-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 11/08/2022] [Indexed: 03/08/2023] Open
Abstract
BACKGROUND A cell exhibits a variety of responses to internal and external cues. These responses are possible, in part, due to the presence of an elaborate gene regulatory network (GRN) in every single cell. In the past 20 years, many groups worked on reconstructing the topological structure of GRNs from large-scale gene expression data using a variety of inference algorithms. Insights gained about participating players in GRNs may ultimately lead to therapeutic benefits. Mutual information (MI) is a widely used metric within this inference/reconstruction pipeline as it can detect any correlation (linear and non-linear) between any number of variables (n-dimensions). However, the use of MI with continuous data (for example, normalized fluorescence intensity measurement of gene expression levels) is sensitive to data size, correlation strength and underlying distributions, and often requires laborious and, at times, ad hoc optimization. RESULTS In this work, we first show that estimating MI of a bi- and tri-variate Gaussian distribution using k-nearest neighbor (kNN) MI estimation results in significant error reduction as compared to commonly used methods based on fixed binning. Second, we demonstrate that implementing the MI-based kNN Kraskov-Stoögbauer-Grassberger (KSG) algorithm leads to a significant improvement in GRN reconstruction for popular inference algorithms, such as Context Likelihood of Relatedness (CLR). Finally, through extensive in-silico benchmarking we show that a new inference algorithm CMIA (Conditional Mutual Information Augmentation), inspired by CLR, in combination with the KSG-MI estimator, outperforms commonly used methods. CONCLUSIONS Using three canonical datasets containing 15 synthetic networks, the newly developed method for GRN reconstruction-which combines CMIA, and the KSG-MI estimator-achieves an improvement of 20-35% in precision-recall measures over the current gold standard in the field. This new method will enable researchers to discover new gene interactions or better choose gene candidates for experimental validations.
Collapse
Affiliation(s)
- Lior I Shachaf
- Department of Biophysics, Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD, 21218, USA.
| | - Elijah Roberts
- Department of Biophysics, Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD, 21218, USA
- 10x Genomics, 6230 Stoneridge Mall Road, Pleasanton, CA, 94588-3260, USA
| | - Patrick Cahan
- Department of Biomedical Engineering, Department of Molecular Biology and Genetics, Institute for Cell Engineering, Johns Hopkins School of Medicine, 733 N. Broadway, Baltimore, MD, 21205, USA
| | - Jie Xiao
- Department of Biophysics and Biophysical Chemistry, Johns Hopkins School of Medicine, 725 N. Wolfe Street, WBSB 708, Baltimore, MD, 21205, USA
| |
Collapse
|
11
|
Seçilmiş D, Hillerton T, Tjärnberg A, Nelander S, Nordling TEM, Sonnhammer ELL. Knowledge of the perturbation design is essential for accurate gene regulatory network inference. Sci Rep 2022; 12:16531. [PMID: 36192495 PMCID: PMC9529923 DOI: 10.1038/s41598-022-19005-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 08/23/2022] [Indexed: 11/08/2022] Open
Abstract
The gene regulatory network (GRN) of a cell executes genetic programs in response to environmental and internal cues. Two distinct classes of methods are used to infer regulatory interactions from gene expression: those that only use observed changes in gene expression, and those that use both the observed changes and the perturbation design, i.e. the targets used to cause the changes in gene expression. Considering that the GRN by definition converts input cues to changes in gene expression, it may be conjectured that the latter methods would yield more accurate inferences but this has not previously been investigated. To address this question, we evaluated a number of popular GRN inference methods that either use the perturbation design or not. For the evaluation we used targeted perturbation knockdown gene expression datasets with varying noise levels generated by two different packages, GeneNetWeaver and GeneSpider. The accuracy was evaluated on each dataset using a variety of measures. The results show that on all datasets, methods using the perturbation design matrix consistently and significantly outperform methods not using it. This was also found to be the case on a smaller experimental dataset from E. coli. Targeted gene perturbations combined with inference methods that use the perturbation design are indispensable for accurate GRN inference.
Collapse
Affiliation(s)
- Deniz Seçilmiş
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121, Solna, Sweden
| | - Thomas Hillerton
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121, Solna, Sweden
| | - Andreas Tjärnberg
- Center for Developmental Genetics, New York University, New York, USA
| | - Sven Nelander
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, 75185, Uppsala, Sweden
| | - Torbjörn E M Nordling
- Department of Mechanical Engineering, National Cheng Kung University, Tainan, 701, Taiwan, ROC
- Department of Applied Physics and Electronics, Umeå University, 90187, Umeå, Sweden
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121, Solna, Sweden.
| |
Collapse
|
12
|
Yan J, Wang X. Unsupervised and semi-supervised learning: the next frontier in machine learning for plant systems biology. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 111:1527-1538. [PMID: 35821601 DOI: 10.1111/tpj.15905] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Revised: 07/05/2022] [Accepted: 07/07/2022] [Indexed: 06/15/2023]
Abstract
Advances in high-throughput omics technologies are leading plant biology research into the era of big data. Machine learning (ML) performs an important role in plant systems biology because of its excellent performance and wide application in the analysis of big data. However, to achieve ideal performance, supervised ML algorithms require large numbers of labeled samples as training data. In some cases, it is impossible or prohibitively expensive to obtain enough labeled training data; here, the paradigms of unsupervised learning (UL) and semi-supervised learning (SSL) play an indispensable role. In this review, we first introduce the basic concepts of ML techniques, as well as some representative UL and SSL algorithms, including clustering, dimensionality reduction, self-supervised learning (self-SL), positive-unlabeled (PU) learning and transfer learning. We then review recent advances and applications of UL and SSL paradigms in both plant systems biology and plant phenotyping research. Finally, we discuss the limitations and highlight the significance and challenges of UL and SSL strategies in plant systems biology.
Collapse
Affiliation(s)
- Jun Yan
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, 100094, China
- National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| | - Xiangfeng Wang
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, 100094, China
- National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| |
Collapse
|
13
|
IoMT-Based Mitochondrial and Multifactorial Genetic Inheritance Disorder Prediction Using Machine Learning. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:2650742. [PMID: 35909844 PMCID: PMC9334098 DOI: 10.1155/2022/2650742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Accepted: 07/04/2022] [Indexed: 11/18/2022]
Abstract
A genetic disorder is a serious disease that affects a large number of individuals around the world. There are various types of genetic illnesses, however, we focus on mitochondrial and multifactorial genetic disorders for prediction. Genetic illness is caused by a number of factors, including a defective maternal or paternal gene, excessive abortions, a lack of blood cells, and low white blood cell count. For premature or teenage life development, early detection of genetic diseases is crucial. Although it is difficult to forecast genetic disorders ahead of time, this prediction is very critical since a person's life progress depends on it. Machine learning algorithms are used to diagnose genetic disorders with high accuracy utilizing datasets collected and constructed from a large number of patient medical reports. A lot of studies have been conducted recently employing genome sequencing for illness detection, but fewer studies have been presented using patient medical history. The accuracy of existing studies that use a patient's history is restricted. The internet of medical things (IoMT) based proposed model for genetic disease prediction in this article uses two separate machine learning algorithms: support vector machine (SVM) and K-Nearest Neighbor (KNN). Experimental results show that SVM has outperformed the KNN and existing prediction methods in terms of accuracy. SVM achieved an accuracy of 94.99% and 86.6% for training and testing, respectively.
Collapse
|
14
|
Obayashi T, Hibara H, Kagaya Y, Aoki Y, Kinoshita K. ATTED-II v11: A Plant Gene Coexpression Database Using a Sample Balancing Technique by Subagging of Principal Components. PLANT & CELL PHYSIOLOGY 2022; 63:869-881. [PMID: 35353884 DOI: 10.1093/pcp/pcac041] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 02/06/2022] [Accepted: 03/29/2022] [Indexed: 05/25/2023]
Abstract
ATTED-II (https://atted.jp) is a gene coexpression database for nine plant species based on publicly available RNAseq and microarray data. One of the challenges in constructing condition-independent coexpression data based on publicly available gene expression data is managing the inherent sampling bias. Here, we report ATTED-II version 11, wherein we adopted a coexpression calculation methodology to balance the samples using principal component analysis and ensemble calculation. This approach has two advantages. First, omitting principal components with low contribution rates reduces the main contributors of noise. Second, balancing large differences in contribution rates enables considering various sample conditions entirely. In addition, based on RNAseq- and microarray-based coexpression data, we provide species-representative, integrated coexpression information to enhance the efficiency of interspecies comparison of the coexpression data. These coexpression data are provided as a standardized z-score to facilitate integrated analysis with different data sources. We believe that with these improvements, ATTED-II is more valuable and powerful for supporting interspecies comparative studies and integrated analyses using heterogeneous data.
Collapse
Affiliation(s)
- Takeshi Obayashi
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai, 980-8679 Japan
| | - Himiko Hibara
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai, 980-8679 Japan
| | - Yuki Kagaya
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai, 980-8679 Japan
| | - Yuichi Aoki
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai, 980-8679 Japan
- Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, 980-8573 Japan
| | - Kengo Kinoshita
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai, 980-8679 Japan
- Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, 980-8573 Japan
- Institute of Development, Aging, and Cancer, Tohoku University, 4-1 Seiryo-machi, Aoba-ku, Sendai, 980-8575 Japan
| |
Collapse
|
15
|
Yang B, Bao W, Chen B, Song D. Single_cell_GRN: gene regulatory network identification based on supervised learning method and Single-cell RNA-seq data. BioData Min 2022; 15:13. [PMID: 35690842 PMCID: PMC9188720 DOI: 10.1186/s13040-022-00297-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 05/22/2022] [Indexed: 11/30/2022] Open
Abstract
Single-cell RNA-seq overcomes the shortcomings of conventional transcriptome sequencing technology and could provide a powerful tool for distinguishing the transcriptome characteristics of various cell types in biological tissues, and comprehensively revealing the heterogeneity of gene expression between cells. Many Intelligent Computing methods have been presented to infer gene regulatory network (GRN) with single-cell RNA-seq data. In this paper, we investigate the performances of seven classifiers including support vector machine (SVM), random forest (RF), Naive Bayesian (NB), GBDT, logical regression (LR), decision tree (DT) and K-Nearest Neighbor (KNN) for solving the binary classification problems of GRN inference with single-cell RNA-seq data (Single_cell_GRN). In SVM, three different kernel functions (linear, polynomial and radial basis function) are utilized, respectively. Three real single-cell RNA-seq datasets from mouse and human are utilized. The experiment results prove that in most cases supervised learning methods (SVM, RF, NB, GBDT, LR, DT and KNN) perform better than unsupervised learning method (GENIE3) in terms of AUC. SVM, RF and KNN have the better performances than other four classifiers. In SVM, linear and polynomial kernels are more fit to model single-cell RNA-seq data.
Collapse
Affiliation(s)
- Bin Yang
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277160, China
| | - Wenzheng Bao
- School of Information and Electrical Engineering, Xuzhou University of Technology, Xuzhou, 221018, China.
| | - Baitong Chen
- Xuzhou First People's Hospital, Xuzhou, 221000, China
| | - Dan Song
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277160, China.
| |
Collapse
|
16
|
Inference of Molecular Regulatory Systems Using Statistical Path-Consistency Algorithm. ENTROPY 2022; 24:e24050693. [PMID: 35626576 PMCID: PMC9142129 DOI: 10.3390/e24050693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 05/12/2022] [Accepted: 05/12/2022] [Indexed: 11/16/2022]
Abstract
One of the key challenges in systems biology and molecular sciences is how to infer regulatory relationships between genes and proteins using high-throughout omics datasets. Although a wide range of methods have been designed to reverse engineer the regulatory networks, recent studies show that the inferred network may depend on the variable order in the dataset. In this work, we develop a new algorithm, called the statistical path-consistency algorithm (SPCA), to solve the problem of the dependence of variable order. This method generates a number of different variable orders using random samples, and then infers a network by using the path-consistent algorithm based on each variable order. We propose measures to determine the edge weights using the corresponding edge weights in the inferred networks, and choose the edges with the largest weights as the putative regulations between genes or proteins. The developed method is rigorously assessed by the six benchmark networks in DREAM challenges, the mitogen-activated protein (MAP) kinase pathway, and a cancer-specific gene regulatory network. The inferred networks are compared with those obtained by using two up-to-date inference methods. The accuracy of the inferred networks shows that the developed method is effective for discovering molecular regulatory systems.
Collapse
|
17
|
Salas-Zárate R, Alor-Hernández G, Salas-Zárate MDP, Paredes-Valverde MA, Bustos-López M, Sánchez-Cervantes JL. Detecting Depression Signs on Social Media: A Systematic Literature Review. Healthcare (Basel) 2022; 10:healthcare10020291. [PMID: 35206905 PMCID: PMC8871802 DOI: 10.3390/healthcare10020291] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Revised: 01/21/2022] [Accepted: 01/29/2022] [Indexed: 01/14/2023] Open
Abstract
Among mental health diseases, depression is one of the most severe, as it often leads to suicide; due to this, it is important to identify and summarize existing evidence concerning depression sign detection research on social media using the data provided by users. This review examines aspects of primary studies exploring depression detection from social media submissions (from 2016 to mid-2021). The search for primary studies was conducted in five digital libraries: ACM Digital Library, IEEE Xplore Digital Library, SpringerLink, Science Direct, and PubMed, as well as on the search engine Google Scholar to broaden the results. Extracting and synthesizing the data from each paper was the main activity of this work. Thirty-four primary studies were analyzed and evaluated. Twitter was the most studied social media for depression sign detection. Word embedding was the most prominent linguistic feature extraction method. Support vector machine (SVM) was the most used machine-learning algorithm. Similarly, the most popular computing tool was from Python libraries. Finally, cross-validation (CV) was the most common statistical analysis method used to evaluate the results obtained. Using social media along with computing tools and classification methods contributes to current efforts in public healthcare to detect signs of depression from sources close to patients.
Collapse
Affiliation(s)
- Rafael Salas-Zárate
- Tecnológico Nacional de México/I. T. Orizaba, Av. Oriente 9 No. 852, Col. Emiliano Zapata, Orizaba 94320, Veracruz, Mexico;
| | - Giner Alor-Hernández
- Tecnológico Nacional de México/I. T. Orizaba, Av. Oriente 9 No. 852, Col. Emiliano Zapata, Orizaba 94320, Veracruz, Mexico;
- Correspondence: ; Tel.: +52-(272)-725-7056
| | - María del Pilar Salas-Zárate
- Tecnológico Nacional de México/I.T.S. Teziutlán, Fracción I y II S/N, Aire Libre, Teziutlán 73960, Puebla, Mexico; (M.d.P.S.-Z.); (M.A.P.-V.)
| | - Mario Andrés Paredes-Valverde
- Tecnológico Nacional de México/I.T.S. Teziutlán, Fracción I y II S/N, Aire Libre, Teziutlán 73960, Puebla, Mexico; (M.d.P.S.-Z.); (M.A.P.-V.)
| | - Maritza Bustos-López
- Centro de Investigación en Inteligencia Artificial/Universidad Veracruzana, Sebastián Camacho 5, Zona Centro, Centro, Xalapa-Enríquez 91000, Veracruz, Mexico;
| | - José Luis Sánchez-Cervantes
- CONACYT-Tecnológico Nacional de México/I. T. Orizaba, Av. Oriente 9 No. 852, Col. Emiliano Zapata, Orizaba 94320, Veracruz, Mexico;
| |
Collapse
|
18
|
Zheng L, Liu Z, Yang Y, Shen HB. Accurate inference of gene regulatory interactions from spatial gene expression with deep contrastive learning. Bioinformatics 2022; 38:746-753. [PMID: 34664632 DOI: 10.1093/bioinformatics/btab718] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2021] [Revised: 09/19/2021] [Accepted: 10/15/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Reverse engineering of gene regulatory networks (GRNs) has long been an attractive research topic in system biology. Computational prediction of gene regulatory interactions has remained a challenging problem due to the complexity of gene expression and scarce information resources. The high-throughput spatial gene expression data, like in situ hybridization images that exhibit temporal and spatial expression patterns, has provided abundant and reliable information for the inference of GRNs. However, computational tools for analyzing the spatial gene expression data are highly underdeveloped. RESULTS In this study, we develop a new method for identifying gene regulatory interactions from gene expression images, called ConGRI. The method is featured by a contrastive learning scheme and deep Siamese convolutional neural network architecture, which automatically learns high-level feature embeddings for the expression images and then feeds the embeddings to an artificial neural network to determine whether or not the interaction exists. We apply the method to a Drosophila embryogenesis dataset and identify GRNs of eye development and mesoderm development. Experimental results show that ConGRI outperforms previous traditional and deep learning methods by a large margin, which achieves accuracies of 76.7% and 68.7% for the GRNs of early eye development and mesoderm development, respectively. It also reveals some master regulators for Drosophila eye development. AVAILABILITYAND IMPLEMENTATION https://github.com/lugimzheng/ConGRI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lujing Zheng
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
- SJTU Paris Elite Institute of Technology (SPEIT), Shanghai Jiao Tong University, Shanghai 200240, China
| | - Zhenhuan Liu
- Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yang Yang
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
- Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai 200240, China
| | - Hong-Bin Shen
- Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China
- Institute of Image Processing and Pattern Recognition and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
19
|
Grimes T, Datta S. A novel probabilistic generator for large-scale gene association networks. PLoS One 2021; 16:e0259193. [PMID: 34767561 PMCID: PMC8589155 DOI: 10.1371/journal.pone.0259193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2021] [Accepted: 10/14/2021] [Indexed: 11/18/2022] Open
Abstract
MOTIVATION Gene expression data provide an opportunity for reverse-engineering gene-gene associations using network inference methods. However, it is difficult to assess the performance of these methods because the true underlying network is unknown in real data. Current benchmarks address this problem by subsampling a known regulatory network to conduct simulations. But the topology of regulatory networks can vary greatly across organisms or tissues, and reference-based generators-such as GeneNetWeaver-are not designed to capture this heterogeneity. This means, for example, benchmark results from the E. coli regulatory network will not carry over to other organisms or tissues. In contrast, probabilistic generators do not require a reference network, and they have the potential to capture a rich distribution of topologies. This makes probabilistic generators an ideal approach for obtaining a robust benchmarking of network inference methods. RESULTS We propose a novel probabilistic network generator that (1) provides an alternative to address the inherent limitation of reference-based generators and (2) is able to create realistic gene association networks, and (3) captures the heterogeneity found across gold-standard networks better than existing generators used in practice. Eight organism-specific and 12 human tissue-specific gold-standard association networks are considered. Several measures of global topology are used to determine the similarity of generated networks to the gold-standards. Along with demonstrating the variability of network structure across organisms and tissues, we show that the commonly used "scale-free" model is insufficient for replicating these structures. AVAILABILITY This generator is implemented in the R package "SeqNet" and is available on CRAN (https://cran.r-project.org/web/packages/SeqNet/index.html).
Collapse
Affiliation(s)
- Tyler Grimes
- Department of Biostatistics, University of Florida, Gainesville, Florida, United States of America
| | - Somnath Datta
- Department of Biostatistics, University of Florida, Gainesville, Florida, United States of America
| |
Collapse
|
20
|
Liu W, Jiang Y, Peng L, Sun X, Gan W, Zhao Q, Tang H. Inferring Gene Regulatory Networks Using the Improved Markov Blanket Discovery Algorithm. Interdiscip Sci 2021; 14:168-181. [PMID: 34495484 DOI: 10.1007/s12539-021-00478-9] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 08/22/2021] [Accepted: 08/24/2021] [Indexed: 11/26/2022]
Abstract
Inferring gene regulatory networks (GRNs) from microarray data can help us understand the mechanisms of life and eventually develop effective therapies. Currently, many computational methods have been used in inferring GRNs. However, owing to high-dimensional data and small samples, these methods often tend to introduce redundant regulatory relationships. Therefore, a novel network inference method based on the improved Markov blanket discovery algorithm, IMBDANET, is proposed to infer GRNs. Specifically, for each target gene, data processing inequality was applied to the Markov blanket discovery algorithm for the accurate differentiation of direct regulatory genes from indirect regulatory genes. Finally, direct regulatory genes were used in constructing GRNs, and the network structure was optimized according to the importance degree score. Experimental results on six public network datasets show that the proposed method can be effectively used to infer GRNs.
Collapse
Affiliation(s)
- Wei Liu
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, 411105, China
| | - Yi Jiang
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China
| | - Li Peng
- School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, 411201, China
| | - Xingen Sun
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China
| | - Wenqing Gan
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China.
| | - Huanrong Tang
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China.
| |
Collapse
|
21
|
Manica M, Bunne C, Mathis R, Cadow J, Ahsen ME, Stolovitzky GA, Martínez MR. COSIFER: a Python package for the consensus inference of molecular interaction networks. Bioinformatics 2021; 37:2070-2072. [PMID: 33241320 PMCID: PMC8337002 DOI: 10.1093/bioinformatics/btaa942] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Revised: 09/25/2020] [Accepted: 10/26/2020] [Indexed: 11/23/2022] Open
Abstract
Summary The advent of high-throughput technologies has provided researchers with measurements of thousands of molecular entities and enable the investigation of the internal regulatory apparatus of the cell. However, network inference from high-throughput data is far from being a solved problem. While a plethora of different inference methods have been proposed, they often lead to non-overlapping predictions, and many of them lack user-friendly implementations to enable their broad utilization. Here, we present Consensus Interaction Network Inference Service (COSIFER), a package and a companion web-based platform to infer molecular networks from expression data using state-of-the-art consensus approaches. COSIFER includes a selection of state-of-the-art methodologies for network inference and different consensus strategies to integrate the predictions of individual methods and generate robust networks. Availability and implementation COSIFER Python source code is available at https://github.com/PhosphorylatedRabbits/cosifer. The web service is accessible at https://ibm.biz/cosifer-aas. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matteo Manica
- Cognitive Computing and Industry Solutions, IBM Research Europe, Rüschlikon, ZH 8803, Switzerland.,Institute of Molecular Systems Biology, ETH Zürich, Zürich, ZH 8093, Switzerland
| | - Charlotte Bunne
- Cognitive Computing and Industry Solutions, IBM Research Europe, Rüschlikon, ZH 8803, Switzerland.,Institute for Machine Learning, ETH Zürich, Zürich, ZH 8092, Switzerland
| | - Roland Mathis
- Cognitive Computing and Industry Solutions, IBM Research Europe, Rüschlikon, ZH 8803, Switzerland
| | - Joris Cadow
- Cognitive Computing and Industry Solutions, IBM Research Europe, Rüschlikon, ZH 8803, Switzerland
| | - Mehmet Eren Ahsen
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029-5674, USA
| | - Gustavo A Stolovitzky
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029-5674, USA.,Translational Systems Biology and Nanobiotechnology, IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA
| | - María Rodríguez Martínez
- Cognitive Computing and Industry Solutions, IBM Research Europe, Rüschlikon, ZH 8803, Switzerland
| |
Collapse
|
22
|
Mishra S, Srivastava D, Kumar V. Improving gene network inference with graph wavelets and making insights about ageing-associated regulatory changes in lungs. Brief Bioinform 2021; 22:bbaa360. [PMID: 33381809 PMCID: PMC7799288 DOI: 10.1093/bib/bbaa360] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 10/12/2020] [Accepted: 11/10/2020] [Indexed: 01/20/2023] Open
Abstract
Using gene-regulatory-networks-based approach for single-cell expression profiles can reveal unprecedented details about the effects of external and internal factors. However, noise and batch effect in sparse single-cell expression profiles can hamper correct estimation of dependencies among genes and regulatory changes. Here, we devise a conceptually different method using graphwavelet filters for improving gene network (GWNet)-based analysis of the transcriptome. Our approach improved the performance of several gene network-inference methods. Most Importantly, GWNet improved consistency in the prediction of gene regulatory network using single-cell transcriptome even in the presence of batch effect. The consistency of predicted gene network enabled reliable estimates of changes in the influence of genes not highlighted by differential-expression analysis. Applying GWNet on the single-cell transcriptome profile of lung cells, revealed biologically relevant changes in the influence of pathways and master regulators due to ageing. Surprisingly, the regulatory influence of ageing on pneumocytes type II cells showed noticeable similarity with patterns due to the effect of novel coronavirus infection in human lung.
Collapse
|
23
|
He W, Tang J, Zou Q, Guo F. MMFGRN: a multi-source multi-model fusion method for gene regulatory network reconstruction. Brief Bioinform 2021; 22:6261916. [PMID: 33939795 DOI: 10.1093/bib/bbab166] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 03/08/2021] [Accepted: 04/08/2021] [Indexed: 01/05/2023] Open
Abstract
Lots of biological processes are controlled by gene regulatory networks (GRNs), such as growth and differentiation of cells, occurrence and development of the diseases. Therefore, it is important to persistently concentrate on the research of GRN. The determination of the gene-gene relationships from gene expression data is a complex issue. Since it is difficult to efficiently obtain the regularity behind the gene-gene relationship by only relying on biochemical experimental methods, thus various computational methods have been used to construct GRNs, and some achievements have been made. In this paper, we propose a novel method MMFGRN (for "Multi-source Multi-model Fusion for Gene Regulatory Network reconstruction") to reconstruct the GRN. In order to make full use of the limited datasets and explore the potential regulatory relationships contained in different data types, we construct the MMFGRN model from three perspectives: single time series data model, single steady-data model and time series and steady-data joint model. And, we utilize the weighted fusion strategy to get the final global regulatory link ranking. Finally, MMFGRN model yields the best performance on the DREAM4 InSilico_Size10 data, outperforming other popular inference algorithms, with an overall area under receiver operating characteristic score of 0.909 and area under precision-recall (AUPR) curves score of 0.770 on the 10-gene network. Additionally, as the network scale increases, our method also has certain advantages with an overall AUPR score of 0.335 on the DREAM4 InSilico_Size100 data. These results demonstrate the good robustness of MMFGRN on different scales of networks. At the same time, the integration strategy proposed in this paper provides a new idea for the reconstruction of the biological network model without prior knowledge, which can help researchers to decipher the elusive mechanism of life.
Collapse
Affiliation(s)
| | | | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
24
|
Mousavi R, Konuru SH, Lobo D. Inference of dynamic spatial GRN models with multi-GPU evolutionary computation. Brief Bioinform 2021; 22:6217729. [PMID: 33834216 DOI: 10.1093/bib/bbab104] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 02/15/2021] [Accepted: 03/09/2021] [Indexed: 02/06/2023] Open
Abstract
Reverse engineering mechanistic gene regulatory network (GRN) models with a specific dynamic spatial behavior is an inverse problem without analytical solutions in general. Instead, heuristic machine learning algorithms have been proposed to infer the structure and parameters of a system of equations able to recapitulate a given gene expression pattern. However, these algorithms are computationally intensive as they need to simulate millions of candidate models, which limits their applicability and requires high computational resources. Graphics processing unit (GPU) computing is an affordable alternative for accelerating large-scale scientific computation, yet no method is currently available to exploit GPU technology for the reverse engineering of mechanistic GRNs from spatial phenotypes. Here we present an efficient methodology to parallelize evolutionary algorithms using GPU computing for the inference of mechanistic GRNs that can develop a given gene expression pattern in a multicellular tissue area or cell culture. The proposed approach is based on multi-CPU threads running the lightweight crossover, mutation and selection operators and launching GPU kernels asynchronously. Kernels can run in parallel in a single or multiple GPUs and each kernel simulates and scores the error of a model using the thread parallelism of the GPU. We tested this methodology for the inference of spatiotemporal mechanistic gene regulatory networks (GRNs)-including topology and parameters-that can develop a given 2D gene expression pattern. The results show a 700-fold speedup with respect to a single CPU implementation. This approach can streamline the extraction of knowledge from biological and medical datasets and accelerate the automatic design of GRNs for synthetic biology applications.
Collapse
Affiliation(s)
- Reza Mousavi
- Department of Biological Sciences at the University of Maryland, Baltimore, MD 21250, USA
| | - Sri Harsha Konuru
- Department of Biological Sciences at the University of Maryland, Baltimore, MD 21250, USA
| | - Daniel Lobo
- Department of Biological Sciences at the University of Maryland, Baltimore, MD 21250, USA
| |
Collapse
|
25
|
Wang Y, Zhou M, Zou Q, Xu L. Machine learning for phytopathology: from the molecular scale towards the network scale. Brief Bioinform 2021; 22:6204793. [PMID: 33787847 DOI: 10.1093/bib/bbab037] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 01/09/2021] [Accepted: 01/26/2021] [Indexed: 01/16/2023] Open
Abstract
With the increasing volume of high-throughput sequencing data from a variety of omics techniques in the field of plant-pathogen interactions, sorting, retrieving, processing and visualizing biological information have become a great challenge. Within the explosion of data, machine learning offers powerful tools to process these complex omics data by various algorithms, such as Bayesian reasoning, support vector machine and random forest. Here, we introduce the basic frameworks of machine learning in dissecting plant-pathogen interactions and discuss the applications and advances of machine learning in plant-pathogen interactions from molecular to network biology, including the prediction of pathogen effectors, plant disease resistance protein monitoring and the discovery of protein-protein networks. The aim of this review is to provide a summary of advances in plant defense and pathogen infection and to indicate the important developments of machine learning in phytopathology.
Collapse
Affiliation(s)
- Yansu Wang
- Postdoctoral Innovation Practice Base, Shenzhen Polytechnic, China
| | | | - Quan Zou
- University of Electronic Science and Technology of China
| | - Lei Xu
- Shenzhen Polytechnic, China
| |
Collapse
|
26
|
Muzio G, O’Bray L, Borgwardt K. Biological network analysis with deep learning. Brief Bioinform 2021; 22:1515-1530. [PMID: 33169146 PMCID: PMC7986589 DOI: 10.1093/bib/bbaa257] [Citation(s) in RCA: 96] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 08/26/2020] [Accepted: 09/11/2020] [Indexed: 12/17/2022] Open
Abstract
Recent advancements in experimental high-throughput technologies have expanded the availability and quantity of molecular data in biology. Given the importance of interactions in biological processes, such as the interactions between proteins or the bonds within a chemical compound, this data is often represented in the form of a biological network. The rise of this data has created a need for new computational tools to analyze networks. One major trend in the field is to use deep learning for this goal and, more specifically, to use methods that work with networks, the so-called graph neural networks (GNNs). In this article, we describe biological networks and review the principles and underlying algorithms of GNNs. We then discuss domains in bioinformatics in which graph neural networks are frequently being applied at the moment, such as protein function prediction, protein-protein interaction prediction and in silico drug discovery and development. Finally, we highlight application areas such as gene regulatory networks and disease diagnosis where deep learning is emerging as a new tool to answer classic questions like gene interaction prediction and automatic disease prediction from data.
Collapse
Affiliation(s)
- Giulia Muzio
- Machine Learning and Computational Biology Lab at ETH Zürich
| | - Leslie O’Bray
- Machine Learning and Computational Biology Lab at ETH Zürich
| | | |
Collapse
|
27
|
Muzio G, O'Bray L, Borgwardt K. Biological network analysis with deep learning. Brief Bioinform 2021; 22:1515-1530. [PMID: 33169146 DOI: 10.1145/3447548.3467442] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 08/26/2020] [Accepted: 09/11/2020] [Indexed: 05/28/2023] Open
Abstract
Recent advancements in experimental high-throughput technologies have expanded the availability and quantity of molecular data in biology. Given the importance of interactions in biological processes, such as the interactions between proteins or the bonds within a chemical compound, this data is often represented in the form of a biological network. The rise of this data has created a need for new computational tools to analyze networks. One major trend in the field is to use deep learning for this goal and, more specifically, to use methods that work with networks, the so-called graph neural networks (GNNs). In this article, we describe biological networks and review the principles and underlying algorithms of GNNs. We then discuss domains in bioinformatics in which graph neural networks are frequently being applied at the moment, such as protein function prediction, protein-protein interaction prediction and in silico drug discovery and development. Finally, we highlight application areas such as gene regulatory networks and disease diagnosis where deep learning is emerging as a new tool to answer classic questions like gene interaction prediction and automatic disease prediction from data.
Collapse
Affiliation(s)
- Giulia Muzio
- Machine Learning and Computational Biology Lab at ETH Zürich
| | - Leslie O'Bray
- Machine Learning and Computational Biology Lab at ETH Zürich
| | | |
Collapse
|
28
|
Zhao M, He W, Tang J, Zou Q, Guo F. A comprehensive overview and critical evaluation of gene regulatory network inference technologies. Brief Bioinform 2021; 22:6128842. [PMID: 33539514 DOI: 10.1093/bib/bbab009] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 12/11/2020] [Accepted: 01/06/2021] [Indexed: 12/12/2022] Open
Abstract
Gene regulatory network (GRN) is the important mechanism of maintaining life process, controlling biochemical reaction and regulating compound level, which plays an important role in various organisms and systems. Reconstructing GRN can help us to understand the molecular mechanism of organisms and to reveal the essential rules of a large number of biological processes and reactions in organisms. Various outstanding network reconstruction algorithms use specific assumptions that affect prediction accuracy, in order to deal with the uncertainty of processing. In order to study why a certain method is more suitable for specific research problem or experimental data, we conduct research from model-based, information-based and machine learning-based method classifications. There are obviously different types of computational tools that can be generated to distinguish GRNs. Furthermore, we discuss several classical, representative and latest methods in each category to analyze core ideas, general steps, characteristics, etc. We compare the performance of state-of-the-art GRN reconstruction technologies on simulated networks and real networks under different scaling conditions. Through standardized performance metrics and common benchmarks, we quantitatively evaluate the stability of various methods and the sensitivity of the same algorithm applying to different scaling networks. The aim of this study is to explore the most appropriate method for a specific GRN, which helps biologists and medical scientists in discovering potential drug targets and identifying cancer biomarkers.
Collapse
Affiliation(s)
- Mengyuan Zhao
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Wenying He
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jijun Tang
- University of South Carolina, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
29
|
Ludl AA, Michoel T. Comparison between instrumental variable and mediation-based methods for reconstructing causal gene networks in yeast. Mol Omics 2021; 17:241-251. [PMID: 33438713 DOI: 10.1039/d0mo00140f] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Causal gene networks model the flow of information within a cell. Reconstructing causal networks from omics data is challenging because correlation does not imply causation. When genomics and transcriptomics data from a segregating population are combined, genomic variants can be used to orient the direction of causality between gene expression traits. Instrumental variable methods use a local expression quantitative trait locus (eQTL) as a randomized instrument for a gene's expression level, and assign target genes based on distal eQTL associations. Mediation-based methods additionally require that distal eQTL associations are mediated by the source gene. A detailed comparison between these methods has not yet been conducted, due to the lack of a standardized implementation of different methods, the limited sample size of most multi-omics datasets, and the absence of ground-truth networks for most organisms. Here we used Findr, a software package providing uniform implementations of instrumental variable, mediation, and coexpression-based methods, a recent dataset of 1012 segregants from a cross between two budding yeast strains, and the Yeastract database of known transcriptional interactions to compare causal gene network inference methods. We found that causal inference methods result in a significant overlap with the ground-truth, whereas coexpression did not perform better than random. A subsampling analysis revealed that the performance of mediation saturates at large sample sizes, due to a loss of sensitivity when residual correlations become significant. Instrumental variable methods on the other hand contain false positive predictions, due to genomic linkage between eQTL instruments. Instrumental variable and mediation-based methods also have complementary roles for identifying causal genes underlying transcriptional hotspots. Instrumental variable methods correctly predicted STB5 targets for a hotspot centred on the transcription factor STB5, whereas mediation failed due to Stb5p auto-regulating its own expression. Mediation suggests a new candidate gene, DNM1, for a hotspot on Chr XII, whereas instrumental variable methods could not distinguish between multiple genes located within the hotspot. In conclusion, causal inference from genomics and transcriptomics data is a powerful approach for reconstructing causal gene networks, which could be further improved by the development of methods to control for residual correlations in mediation analyses, and for genomic linkage and pleiotropic effects from transcriptional hotspots in instrumental variable analyses.
Collapse
Affiliation(s)
- Adriaan-Alexander Ludl
- Computational Biology Unit, Department of Informatics, University of Bergen, PO Box 7803, 5020 Bergen, Norway.
| | | |
Collapse
|
30
|
EnGRNT: Inference of gene regulatory networks using ensemble methods and topological feature extraction. INFORMATICS IN MEDICINE UNLOCKED 2021. [DOI: 10.1016/j.imu.2021.100773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
31
|
Marku M, Verstraete N, Raynal F, Madrid-Mencía M, Domagala M, Fournié JJ, Ysebaert L, Poupot M, Pancaldi V. Insights on TAM Formation from a Boolean Model of Macrophage Polarization Based on In Vitro Studies. Cancers (Basel) 2020; 12:cancers12123664. [PMID: 33297362 PMCID: PMC7762229 DOI: 10.3390/cancers12123664] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 11/26/2020] [Accepted: 11/30/2020] [Indexed: 12/24/2022] Open
Abstract
Simple Summary The recent success of immunotherapy treatments against cancer relies on helping our own body’s defenses in the fight against tumours, namely reinvigorating the cancer killing action of T cells. Unfortunately, in a large proportion of patients these therapies are ineffective, in part due to the presence of other immune cells, macrophages, which are mis-educated by the cancer cells into promoting tumour growth. Here we start from an existing model of macrophage polarization and extend it to the specific conditions encountered inside a tumour by adding signals, receptors, transcription factors and cytokines that are known to be the key components in establishing the cancer cell-macrophage interaction. Then we use a mathematical Boolean model applied to a gene regulatory network of this biological process to simulate its temporal behaviour and explore scenarios that have not been experimentally tested so far. Additionally, the KO and overexpression simulations successfully reproduce the known experimental results while predicting the potential role of regulators (such as STAT1 and EGF) in preventing the formation of pro-tumoural macrophages, which can be tested experimentally. Abstract The tumour microenvironment is the surrounding of a tumour, including blood vessels, fibroblasts, signaling molecules, the extracellular matrix and immune cells, especially neutrophils and monocyte-derived macrophages. In a tumour setting, macrophages encompass a spectrum between a tumour-suppressive (M1) or tumour-promoting (M2) state. The biology of macrophages found in tumours (Tumour Associated Macrophages) remains unclear, but understanding their impact on tumour progression is highly important. In this paper, we perform a comprehensive analysis of a macrophage polarization network, following two lines of enquiry: (i) we reconstruct the macrophage polarization network based on literature, extending it to include important stimuli in a tumour setting, and (ii) we build a dynamical model able to reproduce macrophage polarization in the presence of different stimuli, including the contact with cancer cells. Our simulations recapitulate the documented macrophage phenotypes and their dependencies on specific receptors and transcription factors, while also unravelling the formation of a special type of tumour associated macrophages in an in vitro model of chronic lymphocytic leukaemia. This model constitutes the first step towards elucidating the cross-talk between immune and cancer cells inside tumours, with the ultimate goal of identifying new therapeutic targets that could control the formation of tumour associated macrophages in patients.
Collapse
Affiliation(s)
- Malvina Marku
- INSERM, Centre de Recherches en Cancérologie de Toulouse, 2 Avenue Hubert Curien, 31037 Toulouse, France; (N.V.); (F.R.); (M.M.-M.); (M.D.); (J.-J.F.); (L.Y.); (M.P.)
- Université III Toulouse Paul Sabatier, Route de Narbonne, 31330 Toulouse, France
- Correspondence: (M.M.); (V.P.); Tel.: +33-5-82-74-17-74 (M.M.)
| | - Nina Verstraete
- INSERM, Centre de Recherches en Cancérologie de Toulouse, 2 Avenue Hubert Curien, 31037 Toulouse, France; (N.V.); (F.R.); (M.M.-M.); (M.D.); (J.-J.F.); (L.Y.); (M.P.)
- Université III Toulouse Paul Sabatier, Route de Narbonne, 31330 Toulouse, France
| | - Flavien Raynal
- INSERM, Centre de Recherches en Cancérologie de Toulouse, 2 Avenue Hubert Curien, 31037 Toulouse, France; (N.V.); (F.R.); (M.M.-M.); (M.D.); (J.-J.F.); (L.Y.); (M.P.)
- Université III Toulouse Paul Sabatier, Route de Narbonne, 31330 Toulouse, France
| | - Miguel Madrid-Mencía
- INSERM, Centre de Recherches en Cancérologie de Toulouse, 2 Avenue Hubert Curien, 31037 Toulouse, France; (N.V.); (F.R.); (M.M.-M.); (M.D.); (J.-J.F.); (L.Y.); (M.P.)
- Université III Toulouse Paul Sabatier, Route de Narbonne, 31330 Toulouse, France
| | - Marcin Domagala
- INSERM, Centre de Recherches en Cancérologie de Toulouse, 2 Avenue Hubert Curien, 31037 Toulouse, France; (N.V.); (F.R.); (M.M.-M.); (M.D.); (J.-J.F.); (L.Y.); (M.P.)
- Université III Toulouse Paul Sabatier, Route de Narbonne, 31330 Toulouse, France
| | - Jean-Jacques Fournié
- INSERM, Centre de Recherches en Cancérologie de Toulouse, 2 Avenue Hubert Curien, 31037 Toulouse, France; (N.V.); (F.R.); (M.M.-M.); (M.D.); (J.-J.F.); (L.Y.); (M.P.)
- Université III Toulouse Paul Sabatier, Route de Narbonne, 31330 Toulouse, France
| | - Loïc Ysebaert
- INSERM, Centre de Recherches en Cancérologie de Toulouse, 2 Avenue Hubert Curien, 31037 Toulouse, France; (N.V.); (F.R.); (M.M.-M.); (M.D.); (J.-J.F.); (L.Y.); (M.P.)
- Université III Toulouse Paul Sabatier, Route de Narbonne, 31330 Toulouse, France
- Service d’Hématologie, Institut Universitaire du Cancer de Toulouse-Oncopole, 31330 Toulouse, France
| | - Mary Poupot
- INSERM, Centre de Recherches en Cancérologie de Toulouse, 2 Avenue Hubert Curien, 31037 Toulouse, France; (N.V.); (F.R.); (M.M.-M.); (M.D.); (J.-J.F.); (L.Y.); (M.P.)
- Université III Toulouse Paul Sabatier, Route de Narbonne, 31330 Toulouse, France
| | - Vera Pancaldi
- INSERM, Centre de Recherches en Cancérologie de Toulouse, 2 Avenue Hubert Curien, 31037 Toulouse, France; (N.V.); (F.R.); (M.M.-M.); (M.D.); (J.-J.F.); (L.Y.); (M.P.)
- Université III Toulouse Paul Sabatier, Route de Narbonne, 31330 Toulouse, France
- Barcelona Supercomputing Center, Carrer de Jordi Girona, 29, 31, 08034 Barcelona, Spain
- Correspondence: (M.M.); (V.P.); Tel.: +33-5-82-74-17-74 (M.M.)
| |
Collapse
|
32
|
Wang J, Ma A, Ma Q, Xu D, Joshi T. Inductive inference of gene regulatory network using supervised and semi-supervised graph neural networks. Comput Struct Biotechnol J 2020; 18:3335-3343. [PMID: 33294129 PMCID: PMC7677691 DOI: 10.1016/j.csbj.2020.10.022] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2020] [Revised: 10/19/2020] [Accepted: 10/21/2020] [Indexed: 12/15/2022] Open
Abstract
A novel graph classification formulation in inferring gene regulatory relationships. Graph neural network is powerful to ensemble powers from heuristic skeletons. Our results show GRGRNN outperforms previous methods inductively on benchmarks. GRGNN can be interpreted following the biological network motif hypothesis.
Discovering gene regulatory relationships and reconstructing gene regulatory networks (GRN) based on gene expression data is a classical, long-standing computational challenge in bioinformatics. Computationally inferring a possible regulatory relationship between two genes can be formulated as a link prediction problem between two nodes in a graph. Graph neural network (GNN) provides an opportunity to construct GRN by integrating topological neighbor propagation through the whole gene network. We propose an end-to-end gene regulatory graph neural network (GRGNN) approach to reconstruct GRNs from scratch utilizing the gene expression data, in both a supervised and a semi-supervised framework. To get better inductive generalization capability, GRN inference is formulated as a graph classification problem, to distinguish whether a subgraph centered at two nodes contains the link between the two nodes. A linked pair between a transcription factor (TF) and a target gene, and their neighbors are labeled as a positive subgraph, while an unlinked TF and target gene pair and their neighbors are labeled as a negative subgraph. A GNN model is constructed with node features from both explicit gene expression and graph embedding. We demonstrate a noisy starting graph structure built from partial information, such as Pearson’s correlation coefficient and mutual information can help guide the GRN inference through an appropriate ensemble technique. Furthermore, a semi-supervised scheme is implemented to increase the quality of the classifier. When compared with established methods, GRGNN achieved state-of-the-art performance on the DREAM5 GRN inference benchmarks. GRGNN is publicly available at https://github.com/juexinwang/GRGNN.
Collapse
Affiliation(s)
- Juexin Wang
- Department of Electrical Engineering and Computer Science, and Christopher S. Bond Life Science Center, University of Missouri, 65211, USA
| | - Anjun Ma
- Department of Biomedical Informatics, School of Medicine, Ohio State University, OH 43210, USA
| | - Qin Ma
- Department of Biomedical Informatics, School of Medicine, Ohio State University, OH 43210, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, and Christopher S. Bond Life Science Center, University of Missouri, 65211, USA
| | - Trupti Joshi
- Department of Health Management and Informatics, Institute for Data Science and Informatics, University of Missouri, 65211, USA.,Department of Electrical Engineering and Computer Science, and Christopher S. Bond Life Science Center, University of Missouri, 65211, USA
| |
Collapse
|
33
|
Liu W, Sun X, Peng L, Zhou L, Lin H, Jiang Y. RWRNET: A Gene Regulatory Network Inference Algorithm Using Random Walk With Restart. Front Genet 2020; 11:591461. [PMID: 33101398 PMCID: PMC7545090 DOI: 10.3389/fgene.2020.591461] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 09/02/2020] [Indexed: 11/30/2022] Open
Abstract
Inferring gene regulatory networks from expression data is essential in identifying complex regulatory relationships among genes and revealing the mechanism of certain diseases. Various computation methods have been developed for inferring gene regulatory networks. However, these methods focus on the local topology of the network rather than on the global topology. From network optimisation standpoint, emphasising the global topology of the network also reduces redundant regulatory relationships. In this study, we propose a novel network inference algorithm using Random Walk with Restart (RWRNET) that combines local and global topology relationships. The method first captures the local topology through three elements of random walk and then combines the local topology with the global topology by Random Walk with Restart. The Markov Blanket discovery algorithm is then used to deal with isolated genes. The proposed method is compared with several state-of-the-art methods on the basis of six benchmark datasets. Experimental results demonstrated the effectiveness of the proposed method.
Collapse
Affiliation(s)
- Wei Liu
- School of Computer Science, Xiangtan University, Xiangtan, China.,Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, China
| | - Xingen Sun
- School of Computer Science, Xiangtan University, Xiangtan, China
| | - Li Peng
- School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, China
| | - Lili Zhou
- School of Computer Science, Xiangtan University, Xiangtan, China
| | - Hui Lin
- School of Computer Science, Xiangtan University, Xiangtan, China
| | - Yi Jiang
- School of Computer Science, Xiangtan University, Xiangtan, China
| |
Collapse
|
34
|
Shi M, Sheng Z, Tang H. Prognostic outcome prediction by semi-supervised least squares classification. Brief Bioinform 2020; 22:5935498. [PMID: 33094318 DOI: 10.1093/bib/bbaa249] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 09/04/2020] [Accepted: 09/04/2020] [Indexed: 11/13/2022] Open
Abstract
Although great progress has been made in prognostic outcome prediction, small sample size remains a challenge in obtaining accurate and robust classifiers. We proposed the Rescaled linear square Regression based Least Squares Learning (RRLSL), a jointly developed semi-supervised feature selection and classifier, for predicting prognostic outcome of cancer patients. RRLSL used the least square regression to identify the scale factors and then rank the features in available multiple types of molecular data. We applied the unlabeled multiple molecular data in conjunction with the labeled data to develop a similarity graph. RRLSL produced the constraint with kernel functions to bridge the gap between label information and geometry information from messenger RNA and microRNA expression profiling. Importantly, this semi-supervised model proposed the least squares learning with L2 regularization to develop a semi-supervised classifier. RRLSL suggested the performance improvement in the prognostic outcome prediction and successfully discriminated between the recurrent patients and non-recurrent ones. We also demonstrated that RRLSL improved the accuracy and Area Under the Precision Recall Curve (AUPRC) as compared to the baseline semi-supervised methods. RRLSL is available for a stand-alone software package (https://github.com/ShiMGLab/RRLSL). A short abstract We proposed the Rescaled linear square Regression based Least Squares Learning (RRLSL), a jointly developed semi-supervised feature selection and classifier, for predicting prognostic outcome of cancer patients. RRLSL used the least square regression to identify the scale factors to rank the features in available multiple types of molecular data. RRLSL produced the constraint with kernel functions to bridge the gap between label information and geometry information from messenger RNA and microRNA expression profiling. Importantly, this semi-supervised model proposed the least squares learning with L2 regularization to develop the semi-supervised classifier. RRLSL suggested the performance improvement in the prognostic outcome prediction and successfully discriminated between the recurrent patients and non-recurrent ones.
Collapse
Affiliation(s)
- Mingguang Shi
- School of Electric Engineering and Automation, Hefei University of Technology, Hefei, Anhui, 230009 China
| | - Zhou Sheng
- School of Electric Engineering and Automation, Hefei University of Technology, Hefei, Anhui, 230009 China
| | - Hao Tang
- School of Electric Engineering and Automation, Hefei University of Technology, Hefei, Anhui, 230009 China
| |
Collapse
|
35
|
Razaghi-Moghadam Z, Nikoloski Z. Supervised Learning of Gene Regulatory Networks. ACTA ACUST UNITED AC 2020; 5:e20106. [PMID: 32207875 DOI: 10.1002/cppb.20106] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Identifying the entirety of gene regulatory interactions in a biological system offers the possibility to determine the key molecular factors that affect important traits on the level of cells, tissues, and whole organisms. Despite the development of experimental approaches and technologies for identification of direct binding of transcription factors (TFs) to promoter regions of downstream target genes, computational approaches that utilize large compendia of transcriptomics data are still the predominant methods used to predict direct downstream targets of TFs, and thus reconstruct genome-wide gene-regulatory networks (GRNs). These approaches can broadly be categorized into unsupervised and supervised, based on whether data about known, experimentally verified gene-regulatory interactions are used in the process of reconstructing the underlying GRN. Here, we first describe the generic steps of supervised approaches for GRN reconstruction, since they have been recently shown to result in improved accuracy of the resulting networks? We also illustrate how they can be used with data from model organisms to obtain more accurate prediction of gene regulatory interactions. © 2020 The Authors. Basic Protocol 1: Construction of features used in supervised learning of gene regulatory interactions Basic Protocol 2: Learning the non-interacting TF-gene pairs Basic Protocol 3: Learning a classifier for gene regulatory interactions.
Collapse
Affiliation(s)
- Zahra Razaghi-Moghadam
- Systems Biology and Mathematical Modelling Group, Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany
| | - Zoran Nikoloski
- Systems Biology and Mathematical Modelling Group, Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany.,Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, Germany
| |
Collapse
|
36
|
Randhawa V, Pathania S. Advancing from protein interactomes and gene co-expression networks towards multi-omics-based composite networks: approaches for predicting and extracting biological knowledge. Brief Funct Genomics 2020; 19:364-376. [PMID: 32678894 DOI: 10.1093/bfgp/elaa015] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 05/31/2020] [Accepted: 06/15/2020] [Indexed: 01/17/2023] Open
Abstract
Prediction of biological interaction networks from single-omics data has been extensively implemented to understand various aspects of biological systems. However, more recently, there is a growing interest in integrating multi-omics datasets for the prediction of interactomes that provide a global view of biological systems with higher descriptive capability, as compared to single omics. In this review, we have discussed various computational approaches implemented to infer and analyze two of the most important and well studied interactomes: protein-protein interaction networks and gene co-expression networks. We have explicitly focused on recent methods and pipelines implemented to infer and extract biologically important information from these interactomes, starting from utilizing single-omics data and then progressing towards multi-omics data. Accordingly, recent examples and case studies are also briefly discussed. Overall, this review will provide a proper understanding of the latest developments in protein and gene network modelling and will also help in extracting practical knowledge from them.
Collapse
Affiliation(s)
- Vinay Randhawa
- Department of Biochemistry, Panjab University, Chandigarh, 160014, India
| | - Shivalika Pathania
- Department of Biotechnology, Panjab University, Chandigarh, 160014, India
| |
Collapse
|
37
|
Razaghi-Moghadam Z, Nikoloski Z. Supervised learning of gene-regulatory networks based on graph distance profiles of transcriptomics data. NPJ Syst Biol Appl 2020; 6:21. [PMID: 32606380 PMCID: PMC7327016 DOI: 10.1038/s41540-020-0140-1] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Accepted: 06/09/2020] [Indexed: 02/07/2023] Open
Abstract
Characterisation of gene-regulatory network (GRN) interactions provides a stepping stone to understanding how genes affect cellular phenotypes. Yet, despite advances in profiling technologies, GRN reconstruction from gene expression data remains a pressing problem in systems biology. Here, we devise a supervised learning approach, GRADIS, which utilises support vector machine to reconstruct GRNs based on distance profiles obtained from a graph representation of transcriptomics data. By employing the data from Escherichia coli and Saccharomyces cerevisiae as well as synthetic networks from the DREAM4 and five network inference challenges, we demonstrate that our GRADIS approach outperforms the state-of-the-art supervised and unsupervided approaches. This holds when predictions about target genes for individual transcription factors as well as for the entire network are considered. We employ experimentally verified GRNs from E. coli and S. cerevisiae to validate the predictions and obtain further insights in the performance of the proposed approach. Our GRADIS approach offers the possibility for usage of other network-based representations of large-scale data, and can be readily extended to help the characterisation of other cellular networks, including protein–protein and protein–metabolite interactions.
Collapse
Affiliation(s)
- Zahra Razaghi-Moghadam
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476, Potsdam, Germany.,Systems Biology and Mathematical Modeling group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476, Potsdam, Germany
| | - Zoran Nikoloski
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476, Potsdam, Germany. .,Systems Biology and Mathematical Modeling group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476, Potsdam, Germany.
| |
Collapse
|
38
|
Wu S, Cui T, Zhang X, Tian T. A non-linear reverse-engineering method for inferring genetic regulatory networks. PeerJ 2020; 8:e9065. [PMID: 32391205 PMCID: PMC7195839 DOI: 10.7717/peerj.9065] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Accepted: 04/05/2020] [Indexed: 12/19/2022] Open
Abstract
Hematopoiesis is a highly complex developmental process that produces various types of blood cells. This process is regulated by different genetic networks that control the proliferation, differentiation, and maturation of hematopoietic stem cells (HSCs). Although substantial progress has been made for understanding hematopoiesis, the detailed regulatory mechanisms for the fate determination of HSCs are still unraveled. In this study, we propose a novel approach to infer the detailed regulatory mechanisms. This work is designed to develop a mathematical framework that is able to realize nonlinear gene expression dynamics accurately. In particular, we intended to investigate the effect of possible protein heterodimers and/or synergistic effect in genetic regulation. This approach includes the Extended Forward Search Algorithm to infer network structure (top-down approach) and a non-linear mathematical model to infer dynamical property (bottom-up approach). Based on the published experimental data, we study two regulatory networks of 11 genes for regulating the erythrocyte differentiation pathway and the neutrophil differentiation pathway. The proposed algorithm is first applied to predict the network topologies among 11 genes and 55 non-linear terms which may be for heterodimers and/or synergistic effect. Then, the unknown model parameters are estimated by fitting simulations to the expression data of two different differentiation pathways. In addition, the edge deletion test is conducted to remove possible insignificant regulations from the inferred networks. Furthermore, the robustness property of the mathematical model is employed as an additional criterion to choose better network reconstruction results. Our simulation results successfully realized experimental data for two different differentiation pathways, which suggests that the proposed approach is an effective method to infer the topological structure and dynamic property of genetic regulations.
Collapse
Affiliation(s)
- Siyuan Wu
- School of Mathematics, Monash University, Clayton, VIC, Australia
| | - Tiangang Cui
- School of Mathematics, Monash University, Clayton, VIC, Australia
| | - Xinan Zhang
- School of Mathematics and Statistics, Central China Normal University, Wuhan, PR China
| | - Tianhai Tian
- School of Mathematics, Monash University, Clayton, VIC, Australia
| |
Collapse
|
39
|
Kusano M, Fukushima A, Tabuchi-Kobayashi M, Funayama K, Kojima S, Maruyama K, Yamamoto YY, Nishizawa T, Kobayashi M, Wakazaki M, Sato M, Toyooka K, Osanai-Kondo K, Utsumi Y, Seki M, Fukai C, Saito K, Yamaya T. Cytosolic GLUTAMINE SYNTHETASE1;1 Modulates Metabolism and Chloroplast Development in Roots. PLANT PHYSIOLOGY 2020; 182:1894-1909. [PMID: 32024696 PMCID: PMC7140926 DOI: 10.1104/pp.19.01118] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Accepted: 01/09/2020] [Indexed: 05/31/2023]
Abstract
Nitrogen (N) is an essential macronutrient, and the final form of endogenous inorganic N is ammonium, which is assimilated by Gln synthetase (GS) into Gln. However, how the multiple isoforms of cytosolic GSs contribute to metabolic systems via the regulation of ammonium assimilation remains unclear. In this study, we compared the effects of two rice (Oryza sativa) cytosolic GSs, namely OsGS1;1 and OsGS1;2, on central metabolism in roots using reverse genetics, metabolomic and transcriptomic profiling, and network analyses. We observed (1) abnormal sugar and organic N accumulation and (2) significant up-regulation of genes associated with photosynthesis and chlorophyll biosynthesis in the roots of Osgs1;1 but not Osgs1;2 knockout mutants. Network analysis of the Osgs1;1 mutant suggested that metabolism of Gln was coordinated with the metabolic modules of sugar metabolism, tricarboxylic acid cycle, and carbon fixation. Transcript profiling of Osgs1;1 mutant roots revealed that expression of the rice sigma-factor (OsSIG) genes in the mutants were transiently upregulated. GOLDEN2-LIKE transcription factor-encoding genes, which are involved in chloroplast biogenesis in rice, could not compensate for the lack of OsSIGs in the Osgs1;1 mutant. Microscopic analysis revealed mature chloroplast development in Osgs1;1 roots but not in the roots of Osgs1;2, Osgs1;2-complemented lines, or the wild type. Thus, organic N assimilated by OsGS1;1 affects a broad range of metabolites and transcripts involved in maintaining metabolic homeostasis and plastid development in rice roots, whereas OsGS1;2 has a more specific role, affecting mainly amino acid homeostasis but not carbon metabolism.
Collapse
Affiliation(s)
- Miyako Kusano
- Graduate School of Life and Environmental Sciences, University of Tsukuba, Tsukuba 305-8572, Japan
- RIKEN Center for Sustainable Resource Science, Tsurumi, Yokohama 230-0045, Japan
- Tsukuba Plant Innovation Research Center, University of Tsukuba, Tsukuba 305-8572, Japan
| | - Atsushi Fukushima
- RIKEN Center for Sustainable Resource Science, Tsurumi, Yokohama 230-0045, Japan
| | | | - Kazuhiro Funayama
- Graduate School of Agricultural Science, Tohoku University, Sendai 981-0845, Japan
| | - Soichi Kojima
- Graduate School of Agricultural Science, Tohoku University, Sendai 981-0845, Japan
| | - Kyonoshin Maruyama
- Biological Resources and Post-Harvest Division, Japan International Research Center for Agricultural Sciences, Tsukuba 305-8686, Japan
| | - Yoshiharu Y Yamamoto
- The United Graduate School of Agricultural Science, Gifu University, Gifu 501-1193, Japan
| | - Tomoko Nishizawa
- RIKEN Center for Sustainable Resource Science, Tsurumi, Yokohama 230-0045, Japan
| | - Makoto Kobayashi
- RIKEN Center for Sustainable Resource Science, Tsurumi, Yokohama 230-0045, Japan
| | - Mayumi Wakazaki
- RIKEN Center for Sustainable Resource Science, Tsurumi, Yokohama 230-0045, Japan
| | - Mayuko Sato
- RIKEN Center for Sustainable Resource Science, Tsurumi, Yokohama 230-0045, Japan
| | - Kiminori Toyooka
- RIKEN Center for Sustainable Resource Science, Tsurumi, Yokohama 230-0045, Japan
| | - Kumiko Osanai-Kondo
- RIKEN Center for Sustainable Resource Science, Tsurumi, Yokohama 230-0045, Japan
| | - Yoshinori Utsumi
- RIKEN Center for Sustainable Resource Science, Tsurumi, Yokohama 230-0045, Japan
| | - Motoaki Seki
- RIKEN Center for Sustainable Resource Science, Tsurumi, Yokohama 230-0045, Japan
| | - Chihaya Fukai
- Graduate School of Life and Environmental Sciences, University of Tsukuba, Tsukuba 305-8572, Japan
| | - Kazuki Saito
- RIKEN Center for Sustainable Resource Science, Tsurumi, Yokohama 230-0045, Japan
- Graduate School of Pharmaceutical Sciences, Chiba University, Chiba 260-8675, Japan
| | - Tomoyuki Yamaya
- Graduate School of Agricultural Science, Tohoku University, Sendai 981-0845, Japan
| |
Collapse
|
40
|
Chen X, Li M, Zheng R, Wu FX, Wang J. D3GRN: a data driven dynamic network construction method to infer gene regulatory networks. BMC Genomics 2019; 20:929. [PMID: 31881937 PMCID: PMC6933629 DOI: 10.1186/s12864-019-6298-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND To infer gene regulatory networks (GRNs) from gene-expression data is still a fundamental and challenging problem in systems biology. Several existing algorithms formulate GRNs inference as a regression problem and obtain the network with an ensemble strategy. Recent studies on data driven dynamic network construction provide us a new perspective to solve the regression problem. RESULTS In this study, we propose a data driven dynamic network construction method to infer gene regulatory network (D3GRN), which transforms the regulatory relationship of each target gene into functional decomposition problem and solves each sub problem by using the Algorithm for Revealing Network Interactions (ARNI). To remedy the limitation of ARNI in constructing networks solely from the unit level, a bootstrapping and area based scoring method is taken to infer the final network. On DREAM4 and DREAM5 benchmark datasets, D3GRN performs competitively with the state-of-the-art algorithms in terms of AUPR. CONCLUSIONS We have proposed a novel data driven dynamic network construction method by combining ARNI with bootstrapping and area based scoring strategy. The proposed method performs well on the benchmark datasets, contributing as a competitive method to infer gene regulatory networks in a new perspective.
Collapse
Affiliation(s)
- Xiang Chen
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, China.
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Fang-Xiang Wu
- Department of Mechanical Engineering and Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
41
|
Bhuva DD, Cursons J, Smyth GK, Davis MJ. Differential co-expression-based detection of conditional relationships in transcriptional data: comparative analysis and application to breast cancer. Genome Biol 2019; 20:236. [PMID: 31727119 PMCID: PMC6857226 DOI: 10.1186/s13059-019-1851-8] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 10/02/2019] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Elucidation of regulatory networks, including identification of regulatory mechanisms specific to a given biological context, is a key aim in systems biology. This has motivated the move from co-expression to differential co-expression analysis and numerous methods have been developed subsequently to address this task; however, evaluation of methods and interpretation of the resulting networks has been hindered by the lack of known context-specific regulatory interactions. RESULTS In this study, we develop a simulator based on dynamical systems modelling capable of simulating differential co-expression patterns. With the simulator and an evaluation framework, we benchmark and characterise the performance of inference methods. Defining three different levels of "true" networks for each simulation, we show that accurate inference of causation is difficult for all methods, compared to inference of associations. We show that a z-score-based method has the best general performance. Further, analysis of simulation parameters reveals five network and simulation properties that explained the performance of methods. The evaluation framework and inference methods used in this study are available in the dcanr R/Bioconductor package. CONCLUSIONS Our analysis of networks inferred from simulated data show that hub nodes are more likely to be differentially regulated targets than transcription factors. Based on this observation, we propose an interpretation of the inferred differential network that can reconstruct a putative causal network.
Collapse
Affiliation(s)
- Dharmesh D Bhuva
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia.,School of Mathematics and Statistics, Faculty of Science, University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Joseph Cursons
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia.,Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Gordon K Smyth
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia.,School of Mathematics and Statistics, Faculty of Science, University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Melissa J Davis
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia. .,Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Melbourne, VIC, 3010, Australia. .,Department of Clinical Pathology, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Melbourne, VIC, 3010, Australia.
| |
Collapse
|
42
|
Pliakos K, Vens C. Network inference with ensembles of bi-clustering trees. BMC Bioinformatics 2019; 20:525. [PMID: 31660848 PMCID: PMC6819564 DOI: 10.1186/s12859-019-3104-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Accepted: 09/20/2019] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Network inference is crucial for biomedicine and systems biology. Biological entities and their associations are often modeled as interaction networks. Examples include drug protein interaction or gene regulatory networks. Studying and elucidating such networks can lead to the comprehension of complex biological processes. However, usually we have only partial knowledge of those networks and the experimental identification of all the existing associations between biological entities is very time consuming and particularly expensive. Many computational approaches have been proposed over the years for network inference, nonetheless, efficiency and accuracy are still persisting open problems. Here, we propose bi-clustering tree ensembles as a new machine learning method for network inference, extending the traditional tree-ensemble models to the global network setting. The proposed approach addresses the network inference problem as a multi-label classification task. More specifically, the nodes of a network (e.g., drugs or proteins in a drug-protein interaction network) are modelled as samples described by features (e.g., chemical structure similarities or protein sequence similarities). The labels in our setting represent the presence or absence of links connecting the nodes of the interaction network (e.g., drug-protein interactions in a drug-protein interaction network). RESULTS We extended traditional tree-ensemble methods, such as extremely randomized trees (ERT) and random forests (RF) to ensembles of bi-clustering trees, integrating background information from both node sets of a heterogeneous network into the same learning framework. We performed an empirical evaluation, comparing the proposed approach to currently used tree-ensemble based approaches as well as other approaches from the literature. We demonstrated the effectiveness of our approach in different interaction prediction (network inference) settings. For evaluation purposes, we used several benchmark datasets that represent drug-protein and gene regulatory networks. We also applied our proposed method to two versions of a chemical-protein association network extracted from the STITCH database, demonstrating the potential of our model in predicting non-reported interactions. CONCLUSIONS Bi-clustering trees outperform existing tree-based strategies as well as machine learning methods based on other algorithms. Since our approach is based on tree-ensembles it inherits the advantages of tree-ensemble learning, such as handling of missing values, scalability and interpretability.
Collapse
Affiliation(s)
- Konstantinos Pliakos
- KU Leuven, Campus KULAK, Department of Public Health and Primary Care, Faculty of Medicine, Kortrijk, Belgium. .,ITEC, imec research group at KU Leuven, Kortrijk, Belgium.
| | - Celine Vens
- KU Leuven, Campus KULAK, Department of Public Health and Primary Care, Faculty of Medicine, Kortrijk, Belgium.,ITEC, imec research group at KU Leuven, Kortrijk, Belgium
| |
Collapse
|
43
|
Muldoon JJ, Yu JS, Fassia MK, Bagheri N. Network inference performance complexity: a consequence of topological, experimental and algorithmic determinants. Bioinformatics 2019; 35:3421-3432. [PMID: 30932143 PMCID: PMC6748731 DOI: 10.1093/bioinformatics/btz105] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Revised: 01/24/2019] [Accepted: 02/11/2019] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION Network inference algorithms aim to uncover key regulatory interactions governing cellular decision-making, disease progression and therapeutic interventions. Having an accurate blueprint of this regulation is essential for understanding and controlling cell behavior. However, the utility and impact of these approaches are limited because the ways in which various factors shape inference outcomes remain largely unknown. RESULTS We identify and systematically evaluate determinants of performance-including network properties, experimental design choices and data processing-by developing new metrics that quantify confidence across algorithms in comparable terms. We conducted a multifactorial analysis that demonstrates how stimulus target, regulatory kinetics, induction and resolution dynamics, and noise differentially impact widely used algorithms in significant and previously unrecognized ways. The results show how even if high-quality data are paired with high-performing algorithms, inferred models are sometimes susceptible to giving misleading conclusions. Lastly, we validate these findings and the utility of the confidence metrics using realistic in silico gene regulatory networks. This new characterization approach provides a way to more rigorously interpret how algorithms infer regulation from biological datasets. AVAILABILITY AND IMPLEMENTATION Code is available at http://github.com/bagherilab/networkinference/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Joseph J Muldoon
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
- Interdisciplinary Biological Sciences Program, Northwestern University, Evanston, IL, USA
| | - Jessica S Yu
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
| | - Mohammad-Kasim Fassia
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
- Department of Biomedical Engineering, Northwestern University, Evanston, IL, USA
| | - Neda Bagheri
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
- Interdisciplinary Biological Sciences Program, Northwestern University, Evanston, IL, USA
- Center for Synthetic Biology, Northwestern University, Evanston, IL, USA
- Chemistry of Life Processes Institute, Northwestern University, Evanston, IL, USA
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA
| |
Collapse
|
44
|
Tang BM, Shojaei M, Teoh S, Meyers A, Ho J, Ball TB, Keynan Y, Pisipati A, Kumar A, Eisen DP, Lai K, Gillett M, Santram R, Geffers R, Schreiber J, Mozhui K, Huang S, Parnell GP, Nalos M, Holubova M, Chew T, Booth D, Kumar A, McLean A, Schughart K. Neutrophils-related host factors associated with severe disease and fatality in patients with influenza infection. Nat Commun 2019; 10:3422. [PMID: 31366921 PMCID: PMC6668409 DOI: 10.1038/s41467-019-11249-y] [Citation(s) in RCA: 111] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Accepted: 06/28/2019] [Indexed: 11/09/2022] Open
Abstract
Severe influenza infection has no effective treatment available. One of the key barriers to developing host-directed therapy is a lack of reliable prognostic factors needed to guide such therapy. Here, we use a network analysis approach to identify host factors associated with severe influenza and fatal outcome. In influenza patients with moderate-to-severe diseases, we uncover a complex landscape of immunological pathways, with the main changes occurring in pathways related to circulating neutrophils. Patients with severe disease display excessive neutrophil extracellular traps formation, neutrophil-inflammation and delayed apoptosis, all of which have been associated with fatal outcome in animal models. Excessive neutrophil activation correlates with worsening oxygenation impairment and predicted fatal outcome (AUROC 0.817-0.898). These findings provide new evidence that neutrophil-dominated host response is associated with poor outcomes. Measuring neutrophil-related changes may improve risk stratification and patient selection, a critical first step in developing host-directed immune therapy.
Collapse
Affiliation(s)
- Benjamin M Tang
- Department of Intensive Care Medicine, Nepean Hospital, Sydney, Australia. .,Centre for Immunology and Allergy Research, The Westmead Institute for Medical Research, Sydney, Australia. .,Respiratory Tract Infection Research Node, Marie Bashir Institute for Infectious Diseases and Biosecurity, Sydney, Australia.
| | - Maryam Shojaei
- Department of Intensive Care Medicine, Nepean Hospital, Sydney, Australia.,Centre for Immunology and Allergy Research, The Westmead Institute for Medical Research, Sydney, Australia
| | - Sally Teoh
- Department of Intensive Care Medicine, Nepean Hospital, Sydney, Australia
| | - Adrienne Meyers
- National HIV and Retrovirology Laboratories, JC Wilt Infectious Disease Research Centre, Public Health Agency of Canada, Department of Medical Microbiology and Infectious Disease, University of Manitoba, Winnipeg, Canada
| | - John Ho
- National HIV and Retrovirology Laboratories, JC Wilt Infectious Disease Research Centre, Public Health Agency of Canada, Department of Medical Microbiology and Infectious Disease, University of Manitoba, Winnipeg, Canada
| | - T Blake Ball
- National HIV and Retrovirology Laboratories, JC Wilt Infectious Disease Research Centre, Public Health Agency of Canada, Department of Medical Microbiology and Infectious Disease, University of Manitoba, Winnipeg, Canada
| | - Yoav Keynan
- Department of Internal Medicine, Medical Microbiology and Community Health Sciences, University of Manitoba, Winnipeg, Canada
| | - Amarnath Pisipati
- Department of Chemistry and Biological Chemistry, Harvard University, Cambridge, MA, USA
| | - Aseem Kumar
- Department of Chemistry and Biochemistry, Laurentian University, Laurentian, Canada
| | - Damon P Eisen
- Townsville Hospital, Townsville, Queensland, Australia
| | - Kevin Lai
- Department of Emergency Medicine, Westmead Hospital, Sydney, Australia
| | - Mark Gillett
- Department of Emergency Medicine, Royal North Shore Hospital, Sydney, Australia
| | - Rahul Santram
- Department of Emergency Medicine, St. Vincent Hospital, Sydney, Australia
| | - Robert Geffers
- Genome Analytics, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Jens Schreiber
- Otto-von-Guerike University of Magdeburg, Clinic of Pneumology, Magdeburg, Germany
| | - Khyobeni Mozhui
- Department of Preventive Medicine, University of Tennessee Health Science Centre, Memphis, TN, USA
| | - Stephen Huang
- Department of Intensive Care Medicine, Nepean Hospital, Sydney, Australia
| | - Grant P Parnell
- Centre for Immunology and Allergy Research, The Westmead Institute for Medical Research, Sydney, Australia
| | - Marek Nalos
- Department of Intensive Care Medicine, Nepean Hospital, Sydney, Australia.,Department of Internal Medicine, Medical Faculty Plzen, Charles University Prague, Staré Město, Czech Republic
| | - Monika Holubova
- Biomedical Centre, Medical Faculty Plzen, Charles University Prague, Staré Město, Czech Republic
| | - Tracy Chew
- Sydney Informatic Hub, The University of Sydney, Sydney, Australia
| | - David Booth
- Centre for Immunology and Allergy Research, The Westmead Institute for Medical Research, Sydney, Australia
| | - Anand Kumar
- Section of Critical Care Medicine and Section of Infectious Diseases, Departments of Medicine, Medical Microbiology and Pharmacology, University of Manitoba, Winnipeg, Canada
| | - Anthony McLean
- Department of Intensive Care Medicine, Nepean Hospital, Sydney, Australia
| | - Klaus Schughart
- Department of Infection Genetics, Helmholtz Centre for Infection Research, Braunschweig, Germany.,University of Veterinary Medicine, Hannover, Germany.,Department of Microbiology, Immunology and Biochemistry, University of Tennessee Health Science Centre, Memphis, TN, USA
| |
Collapse
|
45
|
Jansen JE, Gaffney EA, Wagg J, Coles MC. Combining Mathematical Models With Experimentation to Drive Novel Mechanistic Insights Into Macrophage Function. Front Immunol 2019; 10:1283. [PMID: 31244837 PMCID: PMC6563075 DOI: 10.3389/fimmu.2019.01283] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Accepted: 05/20/2019] [Indexed: 12/20/2022] Open
Abstract
This perspective outlines an approach to improve mechanistic understanding of macrophages in inflammation and tissue homeostasis, with a focus on human inflammatory bowel disease (IBD). The approach integrates wet-lab and in-silico experimentation, driven by mechanistic mathematical models of relevant biological processes. Although wet-lab experimentation with genetically modified mouse models and primary human cells and tissues have provided important insights, the role of macrophages in human IBD remains poorly understood. Key open questions include: (1) To what degree hyperinflammatory processes (e.g., gain of cytokine production) and immunodeficiency (e.g., loss of bacterial killing) intersect to drive IBD pathophysiology? and (2) What are the roles of macrophage heterogeneity in IBD onset and progression? Mathematical modeling offers a synergistic approach that can be used to address such questions. Mechanistic models are useful for informing wet-lab experimental designs and provide a knowledge constrained framework for quantitative analysis and interpretation of resulting experimental data. The majority of published mathematical models of macrophage function are based either on animal models, or immortalized human cell lines. These experimental models do not recapitulate important features of human gastrointestinal pathophysiology, and, therefore are limited in the extent to which they can fully inform understanding of human IBD. Thus, we envision a future where mechanistic mathematical models are based on features relevant to human disease and parametrized by richer human datasets, including biopsy tissues taken from IBD patients, human organ-on-a-chip systems and other high-throughput clinical data derived from experimental medicine studies and/or clinical trials on IBD patients.
Collapse
Affiliation(s)
- Joanneke E Jansen
- Mathematical Institute, University of Oxford, Oxford, United Kingdom.,Kennedy Institute of Rheumatology, University of Oxford, Oxford, United Kingdom
| | - Eamonn A Gaffney
- Mathematical Institute, University of Oxford, Oxford, United Kingdom
| | | | - Mark C Coles
- Kennedy Institute of Rheumatology, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
46
|
Haque S, Ahmad JS, Clark NM, Williams CM, Sozzani R. Computational prediction of gene regulatory networks in plant growth and development. CURRENT OPINION IN PLANT BIOLOGY 2019; 47:96-105. [PMID: 30445315 DOI: 10.1016/j.pbi.2018.10.005] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 10/05/2018] [Accepted: 10/18/2018] [Indexed: 05/22/2023]
Abstract
Plants integrate a wide range of cellular, developmental, and environmental signals to regulate complex patterns of gene expression. Recent advances in genomic technologies enable differential gene expression analysis at a systems level, allowing for improved inference of the network of regulatory interactions between genes. These gene regulatory networks, or GRNs, are used to visualize the causal regulatory relationships between regulators and their downstream target genes. Accordingly, these GRNs can represent spatial, temporal, and/or environmental regulations and can identify functional genes. This review summarizes recent computational approaches applied to different types of gene expression data to infer GRNs in the context of plant growth and development. Three stages of GRN inference are described: first, data collection and analysis based on the dataset type; second, network inference application based on data availability and proposed hypotheses; and third, validation based on in silico, in vivo, and in planta methods. In addition, this review relates data collection strategies to biological questions, organizes inference algorithms based on statistical methods and data types, discusses experimental design considerations, and provides guidelines for GRN inference with an emphasis on the benefits of integrative approaches, especially when a priori information is limited. Finally, this review concludes that computational frameworks integrating large-scale heterogeneous datasets are needed for a more accurate (e.g. fewer false interactions), detailed (e.g. discrimination between direct versus indirect interactions), and comprehensive (e.g. genetic regulation under various conditions and spatial locations) inference of GRNs.
Collapse
Affiliation(s)
- Samiul Haque
- Electrical and Computer Engineering, North Carolina State University, Raleigh, USA
| | - Jabeen S Ahmad
- Plant and Microbial Biology, North Carolina State University, Raleigh, USA
| | - Natalie M Clark
- Plant and Microbial Biology, North Carolina State University, Raleigh, USA
| | - Cranos M Williams
- Electrical and Computer Engineering, North Carolina State University, Raleigh, USA.
| | - Rosangela Sozzani
- Plant and Microbial Biology, North Carolina State University, Raleigh, USA.
| |
Collapse
|
47
|
Huynh-Thu VA, Geurts P. Unsupervised Gene Network Inference with Decision Trees and Random Forests. Methods Mol Biol 2019; 1883:195-215. [PMID: 30547401 DOI: 10.1007/978-1-4939-8882-2_8] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
In this chapter, we introduce the reader to a popular family of machine learning algorithms, called decision trees. We then review several approaches based on decision trees that have been developed for the inference of gene regulatory networks (GRNs). Decision trees have indeed several nice properties that make them well-suited for tackling this problem: they are able to detect multivariate interacting effects between variables, are non-parametric, have good scalability, and have very few parameters. In particular, we describe in detail the GENIE3 algorithm, a state-of-the-art method for GRN inference.
Collapse
Affiliation(s)
- Vân Anh Huynh-Thu
- Department of Electrical Engineering and Computer Science, University of Liège, Liège, Belgium.
| | - Pierre Geurts
- Department of Electrical Engineering and Computer Science, University of Liège, Liège, Belgium
| |
Collapse
|
48
|
Tang Y, Li M, Sun J, Zhang T, Zhang J, Zheng P. TRCMGene: A two-step referential compression method for the efficient storage of genetic data. PLoS One 2018; 13:e0206521. [PMID: 30395579 PMCID: PMC6218042 DOI: 10.1371/journal.pone.0206521] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Accepted: 10/08/2018] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND The massive quantities of genetic data generated by high-throughput sequencing pose challenges to data storage, transmission and analyses. These problems are effectively solved through data compression, in which the size of data storage is reduced and the speed of data transmission is improved. Several options are available for compressing and storing genetic data. However, most of these options either do not provide sufficient compression rates or require a considerable length of time for decompression and loading. RESULTS Here, we propose TRCMGene, a lossless genetic data compression method that uses a referential compression scheme. The novel concept of two-step compression method, which builds an index structure using K-means and k-nearest neighbours, is introduced to TRCMGene. Evaluation with several real datasets revealed that the compression factor of TRCMGene ranges from 9 to 21. TRCMGene presents a good balance between compression factor and reading time. On average, the reading time of compressed data is 60% of that of uncompressed data. Thus, TRCMGene not only saves disc space but also saves file access time and speeds up data loading. These effects collectively improve genetic data storage and transmission in the current hardware environment and render system upgrades unnecessary. TRCMGene, user manual and demos could be accessed freely from https://github.com/tangyou79/TRCM. The data mentioned in this manuscript could be downloaded from: https://github.com/tangyou79/TRCM/wiki.
Collapse
Affiliation(s)
- You Tang
- Electrical and Information Engineering College, JiLin Agricultural Science and Technology University, Jilin, China
| | - Min Li
- College of Electrical and Information, Northeast Agricultural University, Harbin, China
| | - Jing Sun
- College of Life Science and Agriculture, Qiqihar University, Qiqihar, China
| | - Tao Zhang
- College of Electrical and Information, Northeast Agricultural University, Harbin, China
| | - Jicheng Zhang
- College of Electrical and Information, Northeast Agricultural University, Harbin, China
- * E-mail: (JCZ); (PZ)
| | - Ping Zheng
- College of Electrical and Information, Northeast Agricultural University, Harbin, China
- * E-mail: (JCZ); (PZ)
| |
Collapse
|
49
|
Stock M, Pahikkala T, Airola A, Waegeman W, De Baets B. Algebraic shortcuts for leave-one-out cross-validation in supervised network inference. Brief Bioinform 2018; 21:262-271. [PMID: 30329015 DOI: 10.1093/bib/bby095] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Revised: 08/21/2018] [Accepted: 09/06/2018] [Indexed: 12/20/2022] Open
Abstract
Supervised machine learning techniques have traditionally been very successful at reconstructing biological networks, such as protein-ligand interaction, protein-protein interaction and gene regulatory networks. Many supervised techniques for network prediction use linear models on a possibly nonlinear pairwise feature representation of edges. Recently, much emphasis has been placed on the correct evaluation of such supervised models. It is vital to distinguish between using a model to either predict new interactions in a given network or to predict interactions for a new vertex not present in the original network. This distinction matters because (i) the performance might dramatically differ between the prediction settings and (ii) tuning the model hyperparameters to obtain the best possible model depends on the setting of interest. Specific cross-validation schemes need to be used to assess the performance in such different prediction settings.In this work we discuss a state-of-the-art kernel-based network inference technique called two-step kernel ridge regression. We show that this regression model can be trained efficiently, with a time complexity scaling with the number of vertices rather than the number of edges. Furthermore, this framework leads to a series of cross-validation shortcuts that allow one to rapidly estimate the model performance for any relevant network prediction setting. This allows computational biologists to fully assess the capabilities of their models. The machine learning techniques with the algebraic shortcuts are implemented in the RLScore software package: https://github.com/aatapa/RLScore.
Collapse
Affiliation(s)
- Michiel Stock
- Department of Data Analysis and Mathematical Modelling, Ghent University, Belgium
| | - Tapio Pahikkala
- Department of Future Technologies, University of Turku, Finland
| | - Antti Airola
- Department of Future Technologies, University of Turku, Finland
| | - Willem Waegeman
- Department of Data Analysis and Mathematical Modelling, Ghent University, Belgium
| | - Bernard De Baets
- Department of Data Analysis and Mathematical Modelling, Ghent University, Belgium
| |
Collapse
|
50
|
Siahpirani AF, Roy S. A prior-based integrative framework for functional transcriptional regulatory network inference. Nucleic Acids Res 2018; 45:e21. [PMID: 27794550 PMCID: PMC5389674 DOI: 10.1093/nar/gkw963] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2015] [Accepted: 10/12/2016] [Indexed: 12/16/2022] Open
Abstract
Transcriptional regulatory networks specify regulatory proteins controlling the context-specific expression levels of genes. Inference of genome-wide regulatory networks is central to understanding gene regulation, but remains an open challenge. Expression-based network inference is among the most popular methods to infer regulatory networks, however, networks inferred from such methods have low overlap with experimentally derived (e.g. ChIP-chip and transcription factor (TF) knockouts) networks. Currently we have a limited understanding of this discrepancy. To address this gap, we first develop a regulatory network inference algorithm, based on probabilistic graphical models, to integrate expression with auxiliary datasets supporting a regulatory edge. Second, we comprehensively analyze our and other state-of-the-art methods on different expression perturbation datasets. Networks inferred by integrating sequence-specific motifs with expression have substantially greater agreement with experimentally derived networks, while remaining more predictive of expression than motif-based networks. Our analysis suggests natural genetic variation as the most informative perturbation for network inference, and, identifies core TFs whose targets are predictable from expression. Multiple reasons make the identification of targets of other TFs difficult, including network architecture and insufficient variation of TF mRNA level. Finally, we demonstrate the utility of our inference algorithm to infer stress-specific regulatory networks and for regulator prioritization.
Collapse
Affiliation(s)
- Alireza F Siahpirani
- Department of Computer Sciences, University of Wisconsin-Madison, 1210 W. Dayton St. Madison, WI 53706-1613, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Discovery Building 330 North Orchard St. Madison, WI 53715, USA.,Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, K6/446 Clinical Sciences Center 600 Highland Avenue Madison, WI 53792-4675, USA
| |
Collapse
|