1
|
Che J, Jin Y, Gragnoli C, Yau ST, Wu R. IdopNetwork as a genomic predictor of drug response. Drug Discov Today 2025; 30:104252. [PMID: 39603519 DOI: 10.1016/j.drudis.2024.104252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Revised: 11/13/2024] [Accepted: 11/21/2024] [Indexed: 11/29/2024]
Abstract
Despite being challenging, elucidating the systematic control mechanisms of multifactorial drug responses is crucial for pharmacogenomic research. We describe a new form of statistical mechanics to reconstruct informative, dynamic, omnidirectional, and personalized networks (idopNetworks), which cover all pharmacogenomic factors and their interconnections, interdependence, and mechanistic roles. IdopNetworks can characterize how cell-cell crosstalk is mediated by genes and proteins to shape body-drug interactions and identify key roadmaps of information flow and propagation for determining drug efficacy and toxicity. We argue that idopNetworks could potentially provide insight into the genomic machinery of drug responses and provide scientific guidance for the design of drugs whose potency is maximized at lower doses.
Collapse
Affiliation(s)
- Jincan Che
- School of Grassland Science, Beijing Forestry University, Beijing 100083, China; Beijing Institute of Mathematical Sciences and Applications, Beijing 101408, China
| | - Yuebo Jin
- Department of Mathematics, Brandeis University, Waltham, MA 02453, USA
| | - Claudia Gragnoli
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA 17033, USA; Department of Medicine, Creighton University School of Medicine, Omaha, NE 68124, USA; Molecular Biology Laboratory, Bios Biotech Multi-Diagnostic Health Center, Rome 00197, Italy
| | - Shing-Tung Yau
- Beijing Institute of Mathematical Sciences and Applications, Beijing 101408, China; Yau Mathematical Sciences Center, Tsinghua University, Beijing 100084, China; Shanghai Institute for Mathematics and Interdisciplinary Sciences, Shanghai 200433, China
| | - Rongling Wu
- Beijing Institute of Mathematical Sciences and Applications, Beijing 101408, China; Yau Mathematical Sciences Center, Tsinghua University, Beijing 100084, China; Shanghai Institute for Mathematics and Interdisciplinary Sciences, Shanghai 200433, China.
| |
Collapse
|
2
|
Schmid G, Gottwald S, Braun DA. Bounded Rational Decision Networks With Belief Propagation. Neural Comput 2024; 37:76-127. [PMID: 39383021 DOI: 10.1162/neco_a_01719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 07/08/2024] [Indexed: 10/11/2024]
Abstract
Complex information processing systems that are capable of a wide variety of tasks, such as the human brain, are composed of specialized units that collaborate and communicate with each other. An important property of such information processing networks is locality: there is no single global unit controlling the modules, but information is exchanged locally. Here, we consider a decision-theoretic approach to study networks of bounded rational decision makers that are allowed to specialize and communicate with each other. In contrast to previous work that has focused on feedforward communication between decision-making agents, we consider cyclical information processing paths allowing for back-and-forth communication. We adapt message-passing algorithms to suit this purpose, essentially allowing for local information flow between units and thus enabling circular dependency structures. We provide examples that show how repeated communication can increase performance given that each unit's information processing capability is limited and that decision-making systems with too few or too many connections and feedback loops achieve suboptimal utility.
Collapse
Affiliation(s)
- Gerrit Schmid
- Ulm University Institute of Neuroinformatics, 89081 Ulm, Germany
| | | | - Daniel A Braun
- Ulm University Institute of Neuroinformatics, 89081 Ulm, Germany
| |
Collapse
|
3
|
Hou L, Geng Z, Yuan Z, Shi X, Wang C, Chen F, Li H, Xue F. MRSL: a causal network pruning algorithm based on GWAS summary data. Brief Bioinform 2024; 25:bbae086. [PMID: 38487847 PMCID: PMC10940843 DOI: 10.1093/bib/bbae086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Revised: 02/01/2024] [Accepted: 02/15/2024] [Indexed: 03/18/2024] Open
Abstract
Causal discovery is a powerful tool to disclose underlying structures by analyzing purely observational data. Genetic variants can provide useful complementary information for structure learning. Recently, Mendelian randomization (MR) studies have provided abundant marginal causal relationships of traits. Here, we propose a causal network pruning algorithm MRSL (MR-based structure learning algorithm) based on these marginal causal relationships. MRSL combines the graph theory with multivariable MR to learn the conditional causal structure using only genome-wide association analyses (GWAS) summary statistics. Specifically, MRSL utilizes topological sorting to improve the precision of structure learning. It proposes MR-separation instead of d-separation and three candidates of sufficient separating set for MR-separation. The results of simulations revealed that MRSL had up to 2-fold higher F1 score and 100 times faster computing time than other eight competitive methods. Furthermore, we applied MRSL to 26 biomarkers and 44 International Classification of Diseases 10 (ICD10)-defined diseases using GWAS summary data from UK Biobank. The results cover most of the expected causal links that have biological interpretations and several new links supported by clinical case reports or previous observational literatures.
Collapse
Affiliation(s)
- Lei Hou
- Beijing International Center for Mathematical Research, Peking University, Beijing, People’s Republic of China, 100871
| | - Zhi Geng
- School of Mathematics and Statistics, Beijing Technology and Business University, Beijing, People’s Republic of China, 100048
| | - Zhongshang Yuan
- Department of Epidemiology and Health Statistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China, 250000
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China, 250000
| | - Xu Shi
- Department of Biostatistics, University of Michigan, Ann Arbor, USA
| | - Chuan Wang
- Qilu Hospital, Cheeloo College of Medicine, Shandong University, Jinan, People's Republic of China, 250000
| | - Feng Chen
- School of Public Health, Nanjing Medical University, Nanjing, China, 211166
| | - Hongkai Li
- Department of Epidemiology and Health Statistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China, 250000
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China, 250000
| | - Fuzhong Xue
- Department of Epidemiology and Health Statistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China, 250000
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China, 250000
- Qilu Hospital, Cheeloo College of Medicine, Shandong University, Jinan, People's Republic of China, 250000
| |
Collapse
|
4
|
Ruiz Sarrias O, Gónzalez Deza C, Rodríguez Rodríguez J, Arrizibita Iriarte O, Vizcay Atienza A, Zumárraga Lizundia T, Sayar Beristain O, Aldaz Pastor A. Predicting Severe Haematological Toxicity in Gastrointestinal Cancer Patients Undergoing 5-FU-Based Chemotherapy: A Bayesian Network Approach. Cancers (Basel) 2023; 15:4206. [PMID: 37686482 PMCID: PMC10486471 DOI: 10.3390/cancers15174206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 07/28/2023] [Accepted: 08/16/2023] [Indexed: 09/10/2023] Open
Abstract
PURPOSE Severe toxicity is reported in about 30% of gastrointestinal cancer patients receiving 5-Fluorouracil (5-FU)-based chemotherapy. To date, limited tools exist to identify at risk patients in this setting. The objective of this study was to address this need by designing a predictive model using a Bayesian network, a probabilistic graphical model offering robust, explainable predictions. METHODS We utilized a dataset of 267 gastrointestinal cancer patients, conducting preprocessing, and splitting it into TRAIN and TEST sets (80%:20% ratio). The RandomForest algorithm assessed variable importance based on MeanDecreaseGini coefficient. The bnlearn R library helped design a Bayesian network model using a 10-fold cross-validation on the TRAIN set and the aic-cg method for network structure optimization. The model's performance was gauged based on accuracy, sensitivity, and specificity, using cross-validation on the TRAIN set and independent validation on the TEST set. RESULTS The model demonstrated satisfactory performance with an average accuracy of 0.85 (±0.05) and 0.80 on TRAIN and TEST datasets, respectively. The sensitivity and specificity were 0.82 (±0.14) and 0.87 (±0.07) for the TRAIN dataset, and 0.71 and 0.83 for the TEST dataset, respectively. A user-friendly tool was developed for clinical implementation. CONCLUSIONS Despite several limitations, our Bayesian network model demonstrated a high level of accuracy in predicting the risk of developing severe haematological toxicity in gastrointestinal cancer patients receiving 5-FU-based chemotherapy. Future research should aim at model validation in larger cohorts of patients and different clinical settings.
Collapse
Affiliation(s)
- Oskitz Ruiz Sarrias
- Department of Mathematics and Statistic, NNBi, 31191 Esquiroz, Navarra, Spain; (O.R.S.)
| | - Cristina Gónzalez Deza
- Department of Medical Oncology, Clínica Universidad De Navarra, 31008 Pamplona, Navarra, Spain; (C.G.D.); (J.R.R.); (T.Z.L.)
| | - Javier Rodríguez Rodríguez
- Department of Medical Oncology, Clínica Universidad De Navarra, 31008 Pamplona, Navarra, Spain; (C.G.D.); (J.R.R.); (T.Z.L.)
| | | | - Angel Vizcay Atienza
- Department of Medical Oncology, Clínica Universidad De Navarra, 31008 Pamplona, Navarra, Spain; (C.G.D.); (J.R.R.); (T.Z.L.)
| | - Teresa Zumárraga Lizundia
- Department of Medical Oncology, Clínica Universidad De Navarra, 31008 Pamplona, Navarra, Spain; (C.G.D.); (J.R.R.); (T.Z.L.)
| | | | | |
Collapse
|
5
|
Can H, Chanumolu SK, Nielsen BD, Alvarez S, Naldrett MJ, Ünlü G, Otu HH. Integration of Meta-Multi-Omics Data Using Probabilistic Graphs and External Knowledge. Cells 2023; 12:1998. [PMID: 37566077 PMCID: PMC10417344 DOI: 10.3390/cells12151998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 07/11/2023] [Accepted: 08/02/2023] [Indexed: 08/12/2023] Open
Abstract
Multi-omics has the promise to provide a detailed molecular picture of biological systems. Although obtaining multi-omics data is relatively easy, methods that analyze such data have been lagging. In this paper, we present an algorithm that uses probabilistic graph representations and external knowledge to perform optimal structure learning and deduce a multifarious interaction network for multi-omics data from a bacterial community. Kefir grain, a microbial community that ferments milk and creates kefir, represents a self-renewing, stable, natural microbial community. Kefir has been shown to have a wide range of health benefits. We obtained a controlled bacterial community using the two most abundant and well-studied species in kefir grains: Lentilactobacillus kefiri and Lactobacillus kefiranofaciens. We applied growth temperatures of 30 °C and 37 °C and obtained transcriptomic, metabolomic, and proteomic data for the same 20 samples (10 samples per temperature). We obtained a multi-omics interaction network, which generated insights that would not have been possible with single-omics analysis. We identified interactions among transcripts, proteins, and metabolites, suggesting active toxin/antitoxin systems. We also observed multifarious interactions that involved the shikimate pathway. These observations helped explain bacterial adaptation to different stress conditions, co-aggregation, and increased activation of L. kefiranofaciens at 37 °C.
Collapse
Affiliation(s)
- Handan Can
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Sree K. Chanumolu
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Barbara D. Nielsen
- Department of Animal, Veterinary and Food Sciences, University of Idaho, Moscow, ID 83844, USA
| | - Sophie Alvarez
- Proteomics and Metabolomics Facility, Nebraska Center for Biotechnology, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Michael J. Naldrett
- Proteomics and Metabolomics Facility, Nebraska Center for Biotechnology, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Gülhan Ünlü
- Department of Animal, Veterinary and Food Sciences, University of Idaho, Moscow, ID 83844, USA
- Department of Chemical and Biological Engineering, University of Idaho, Moscow, ID 83844, USA
- School of Food Science, Washington State University, Pullman, WA 99164, USA
| | - Hasan H. Otu
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| |
Collapse
|
6
|
A Series-Based Deep Learning Approach to Lung Nodule Image Classification. Cancers (Basel) 2023; 15:cancers15030843. [PMID: 36765801 PMCID: PMC9913559 DOI: 10.3390/cancers15030843] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Revised: 01/24/2023] [Accepted: 01/28/2023] [Indexed: 02/01/2023] Open
Abstract
Although many studies have shown that deep learning approaches yield better results than traditional methods based on manual features, CADs methods still have several limitations. These are due to the diversity in imaging modalities and clinical pathologies. This diversity creates difficulties because of variation and similarities between classes. In this context, the new approach from our study is a hybrid method that performs classifications using both medical image analysis and radial scanning series features. Hence, the areas of interest obtained from images are subjected to a radial scan, with their centers as poles, in order to obtain series. A U-shape convolutional neural network model is then used for the 4D data classification problem. We therefore present a novel approach to the classification of 4D data obtained from lung nodule images. With radial scanning, the eigenvalue of nodule images is captured, and a powerful classification is performed. According to our results, an accuracy of 92.84% was obtained and much more efficient classification scores resulted as compared to recent classifiers.
Collapse
|
7
|
Bulbul Ahmed M, Humayan Kabir A. Understanding of the various aspects of gene regulatory networks related to crop improvement. Gene 2022; 833:146556. [PMID: 35609798 DOI: 10.1016/j.gene.2022.146556] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 03/14/2022] [Accepted: 05/06/2022] [Indexed: 12/30/2022]
Abstract
The hierarchical relationship between transcription factors, associated proteins, and their target genes is defined by a gene regulatory network (GRN). GRNs allow us to understand how the genotype and environment of a plant are incorporated to control the downstream physiological responses. During plant growth or environmental acclimatization, GRNs are diverse and can be differently regulated across tissue types and organs. An overview of recent advances in the development of GRN that speed up basic and applied plant research is given here. Furthermore, the overview of genome and transcriptome involving GRN research along with the exciting advancement and application are discussed. In addition, different approaches to GRN predictions were elucidated. In this review, we also describe the role of GRN in crop improvement, crop plant manipulation, stress responses, speed breeding and identifying genetic variations/locus. Finally, the challenges and prospects of GRN in plant biology are discussed.
Collapse
Affiliation(s)
- Md Bulbul Ahmed
- Plant Science Department, McGill University, 21111 lakeshore Road, Ste. Anne de Bellevue H9X3V9, Quebec, Canada; Institut de Recherche en Biologie Végétale (IRBV), University of Montreal, Montréal, Québec H1X 2B2, Canada.
| | | |
Collapse
|
8
|
Identifying large scale interaction atlases using probabilistic graphs and external knowledge. J Clin Transl Sci 2022; 6:e27. [PMID: 35321220 PMCID: PMC8922291 DOI: 10.1017/cts.2022.18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 12/29/2021] [Accepted: 02/07/2022] [Indexed: 11/17/2022] Open
Abstract
Introduction: Reconstruction of gene interaction networks from experimental data provides a deep understanding of the underlying biological mechanisms. The noisy nature of the data and the large size of the network make this a very challenging task. Complex approaches handle the stochastic nature of the data but can only do this for small networks; simpler, linear models generate large networks but with less reliability. Methods: We propose a divide-and-conquer approach using probabilistic graph representations and external knowledge. We cluster the experimental data and learn an interaction network for each cluster, which are merged using the interaction network for the representative genes selected for each cluster. Results: We generated an interaction atlas for 337 human pathways yielding a network of 11,454 genes with 17,777 edges. Simulated gene expression data from this atlas formed the basis for reconstruction. Based on the area under the curve of the precision-recall curve, the proposed approach outperformed the baseline (random classifier) by ∼15-fold and conventional methods by ∼5–17-fold. The performance of the proposed workflow is significantly linked to the accuracy of the clustering step that tries to identify the modularity of the underlying biological mechanisms. Conclusions: We provide an interaction atlas generation workflow optimizing the algorithm/parameter selection. The proposed approach integrates external knowledge in the reconstruction of the interactome using probabilistic graphs. Network characterization and understanding long-range effects in interaction atlases provide means for comparative analysis with implications in biomarker discovery and therapeutic approaches. The proposed workflow is freely available at http://otulab.unl.edu/atlas.
Collapse
|
9
|
Martínez-García M, Hernández-Lemus E. Data Integration Challenges for Machine Learning in Precision Medicine. Front Med (Lausanne) 2022; 8:784455. [PMID: 35145977 PMCID: PMC8821900 DOI: 10.3389/fmed.2021.784455] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 12/28/2021] [Indexed: 12/19/2022] Open
Abstract
A main goal of Precision Medicine is that of incorporating and integrating the vast corpora on different databases about the molecular and environmental origins of disease, into analytic frameworks, allowing the development of individualized, context-dependent diagnostics, and therapeutic approaches. In this regard, artificial intelligence and machine learning approaches can be used to build analytical models of complex disease aimed at prediction of personalized health conditions and outcomes. Such models must handle the wide heterogeneity of individuals in both their genetic predisposition and their social and environmental determinants. Computational approaches to medicine need to be able to efficiently manage, visualize and integrate, large datasets combining structure, and unstructured formats. This needs to be done while constrained by different levels of confidentiality, ideally doing so within a unified analytical architecture. Efficient data integration and management is key to the successful application of computational intelligence approaches to medicine. A number of challenges arise in the design of successful designs to medical data analytics under currently demanding conditions of performance in personalized medicine, while also subject to time, computational power, and bioethical constraints. Here, we will review some of these constraints and discuss possible avenues to overcome current challenges.
Collapse
Affiliation(s)
- Mireya Martínez-García
- Clinical Research Division, National Institute of Cardiology ‘Ignacio Chávez’, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine (INMEGEN), Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autnoma de Mexico, Mexico City, Mexico
| |
Collapse
|
10
|
Tripp BA, Otu HH. Integration of Multi-Omics Data Using Probabilistic Graph Models and
External Knowledge. Curr Bioinform 2022. [DOI: 10.2174/1574893616666210906141545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
High-throughput sequencing technologies have revolutionized the ability to
perform systems-level biology and elucidate molecular mechanisms of disease through the comprehensive
characterization of different layers of biological information. Integration of these heterogeneous
layers can provide insight into the underlying biology but is challenged by modeling complex interactions.
Objective:
We introduce OBaNK: omics integration using Bayesian networks and external knowledge,
an algorithm to model interactions between heterogeneous high-dimensional biological data to elucidate
complex functional clusters and emergent relationships associated with an observed phenotype.
Method:
Using Bayesian network learning, we modeled the statistical dependencies and interactions
between lipidomics, proteomics, and metabolomics data. The strength of a learned interaction between
molecules was altered based on external knowledge.
Results :
Networks learned from synthetic datasets based on real pathways achieved an average area under
the curve score of ~0.85, an improvement of ~0.23 from baseline methods. When applied to real
multi-omics data collected during pregnancy, five distinct functional networks of heterogeneous biological
data were identified, and the results were compared to other multi-omics integration approaches.
Conclusion:
OBaNK successfully improved the accuracy of learning interaction networks from data integrating
external knowledge, identified heterogeneous functional networks from real data, and suggested
potential novel interactions associated with the phenotype. These findings can guide future hypothesis
generation. OBaNK source code is available at: https://github.com/bridgettripp/OBaNK.git, and a
graphical user interface is available at: http://otulab.unl.edu/OBaNK.
Collapse
Affiliation(s)
- Bridget A. Tripp
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
- PhD Program of Complex Biosystems, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| | - Hasan H. Otu
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| |
Collapse
|
11
|
Abstract
Cancer is a genetic disease in which multiple genes are perturbed. Thus, information about the regulatory relationships between genes is necessary for the identification of biomarkers and therapeutic targets. In this review, methods for inference of gene regulatory networks (GRNs) from transcriptomics data that are used in cancer research are introduced. The methods are classified into three categories according to the analysis model. The first category includes methods that use pair-wise measures between genes, including correlation coefficient and mutual information. The second category includes methods that determine the genetic regulatory relationship using multivariate measures, which consider the expression profiles of all genes concurrently. The third category includes methods using supervised and integrative approaches. The supervised approach estimates the regulatory relationship using a supervised learning method that constructs a regression or classification model for predicting whether there is a regulatory relationship between genes with input data of gene expression profiles and class labels of prior biological knowledge. The integrative method is an expansion of the supervised method and uses more data and biological knowledge for predicting the regulatory relationship. Furthermore, simulation and experimental validation of the estimated GRNs are also discussed in this review. This review identified that most GRN inference methods are not specific for cancer transcriptome data, and such methods are required for better understanding of cancer pathophysiology. In addition, more systematic methods for validation of the estimated GRNs need to be developed in the context of cancer biology.
Collapse
|
12
|
Salimi D, Moeini A. Incorporating K-mers Highly Correlated to Epigenetic Modifications for Bayesian Inference of Gene Interactions. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200728193621] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Objective::
A gene interaction network, along with its related biological features, has an
important role in computational biology. Bayesian network, as an efficient model, based on
probabilistic concepts is able to exploit known and novel biological casual relationships between
genes. The success of Bayesian networks in predicting the relationships greatly depends on
selecting priors.
Methods::
K-mers have been applied as the prominent features to uncover the similarity between
genes in a specific pathway, suggesting that this feature can be applied to study genes
dependencies. In this study, we propose k-mers (4,5 and 6-mers) highly correlated with epigenetic
modifications, including 17 modifications, as a new prior for Bayesian inference in the gene
interaction network.
Result::
Employing this model on a network of 23 human genes and on a network based on 27
genes related to yeast resulted in F-measure improvements in different biological networks.
Conclusion::
The improvements in the best case are 12%, 36%, and 10% in the pathway, coexpression,
and physical interaction, respectively.
Collapse
Affiliation(s)
- Dariush Salimi
- Department of Animal Science, Faculty of Agriculture, University of Zanjan, Zanjan, Iran
| | - Ali Moeini
- Department of Algorithms and Computation, Faculty of Engineering Science, College of Engineering, University of Tehran, Tehran, Iran
| |
Collapse
|
13
|
Chanumolu SK, Albahrani M, Can H, Otu HH. KEGG2Net: Deducing gene interaction networks and acyclic graphs from KEGG pathways. EMBNET.JOURNAL 2021; 26. [PMID: 33880340 PMCID: PMC8055051 DOI: 10.14806/ej.26.0.949] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database provides a manual curation of biological pathways that involve genes (or gene products), metabolites, chemical compounds, maps, and other entries. However, most applications and datasets involved in omics are gene or protein-centric requiring pathway representations that include direct and indirect interactions only between genes. Furthermore, special methodologies, such as Bayesian networks require acyclic representations of graphs. We developed KEGG2Net, a web resource that generates a network involving only the genes represented on a KEGG pathway with all of the direct and indirect gene-gene interactions deduced from the pathway. KEGG2Net offers four different methods to remove cycles from the resulting gene interaction network, converting them into directed acyclic graphs (DAGs). We generated synthetic gene expression data using the gene interaction networks deduced from the KEGG pathways and performed a comparative analysis of different cycle removal methods by testing the fitness of their DAGs to the data and by the number of edges they eliminate. Our results indicate that an ensemble method for cycle removal performs as the best approach to convert the gene interaction networks into DAGs. Resulting gene interaction networks and DAGs are represented in multiple user-friendly formats that can be used in other applications, and as images for quick and easy visualisation. The KEGG2Net web portal converts KEGG maps for any organism into gene-gene interaction networks and corresponding DAGS representing all of the direct and indirect interactions among the genes.
Collapse
Affiliation(s)
- Sree K Chanumolu
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE, United States
| | - Mustafa Albahrani
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE, United States
| | - Handan Can
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE, United States
| | - Hasan H Otu
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE, United States
| |
Collapse
|
14
|
Liu J, Tian Z, Xiao Y, Liu H, Hao S, Zhang X, Wang C, Sun J, Yu H, Yan J. Gene Regulatory Relationship Mining Using Improved Three-Phase Dependency Analysis Approach. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:339-346. [PMID: 30281476 DOI: 10.1109/tcbb.2018.2872993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
How to mine the gene regulatory relationship and construct gene regulatory network (GRN) is of utmost interest within the whole biological community, however, which has been consistently a challenging problem since the tremendous complexity in cellular systems. In present work, we construct gene regulatory network using an improved three-phase dependency analysis algorithm (TPDA) Bayesian network learning method, which includes the steps of Drafting, Thickening, and Thinning. In order to solve the problem of learning result is not reliable due to the high order conditional independence test, we use the entropy estimation approach of Gaussian kernel probability density estimator to calculate the (conditional) mutual information between genes. The experiment on the public benchmark data sets show the improved method outperforms the other nine kinds of Bayesian network learning methods when to process the data with large sample size, with small number of discrete values, and the frequency of different discrete values is about same. In addition, the improved TPDA method was further applied on a real large gene expression data set on RNA-seq from a global collection with 368 elite maize inbred lines. Experiment results show it performs better than the original TPDA method and the other nine kinds of Bayesian network learning algorithms significantly.
Collapse
|
15
|
Eicher T, Patt A, Kautto E, Machiraju R, Mathé E, Zhang Y. Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge. BMC Bioinformatics 2019; 20:669. [PMID: 31861998 PMCID: PMC6923881 DOI: 10.1186/s12859-019-3253-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Background Proteomic measurements, which closely reflect phenotypes, provide insights into gene expression regulations and mechanisms underlying altered phenotypes. Further, integration of data on proteome and transcriptome levels can validate gene signatures associated with a phenotype. However, proteomic data is not as abundant as genomic data, and it is thus beneficial to use genomic features to predict protein abundances when matching proteomic samples or measurements within samples are lacking. Results We evaluate and compare four data-driven models for prediction of proteomic data from mRNA measured in breast and ovarian cancers using the 2017 DREAM Proteogenomics Challenge data. Our results show that Bayesian network, random forests, LASSO, and fuzzy logic approaches can predict protein abundance levels with median ground truth-predicted correlation values between 0.2 and 0.5. However, the most accurately predicted proteins differ considerably between approaches. Conclusions In addition to benchmarking aforementioned machine learning approaches for predicting protein levels from transcript levels, we discuss challenges and potential solutions in state-of-the-art proteogenomic analyses.
Collapse
Affiliation(s)
- Tara Eicher
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, 43210, USA
| | - Andrew Patt
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
| | - Esko Kautto
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
| | - Raghu Machiraju
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, 43210, USA. .,Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA.
| | - Ewy Mathé
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA.
| | - Yan Zhang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA. .,The Ohio State University Comprehensive Cancer Center (OSUCCC - James), Columbus, OH, 43210, USA.
| |
Collapse
|
16
|
Kourou K, Rigas G, Papaloukas C, Mitsis M, Fotiadis DI. Cancer classification from time series microarray data through regulatory Dynamic Bayesian Networks. Comput Biol Med 2019; 116:103577. [PMID: 32001012 DOI: 10.1016/j.compbiomed.2019.103577] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Revised: 12/05/2019] [Accepted: 12/05/2019] [Indexed: 01/05/2023]
Abstract
Genomic profiling of cancer studies has generated comprehensive gene expression patterns for diverse phenotypes. Computational methods which employ transcriptomics datasets have been proposed to model gene expression data. Dynamic Bayesian Networks (DBNs) have been used for modeling time series datasets and for the inference of regulatory networks. Furthermore, cancer classification through DBN-based approaches could reveal the importance of exploiting knowledge from statistically significant genes and key regulatory molecules. Although microarray datasets have been employed extensively by several classification methods for decision making, the use of new knowledge from the pathway level has not been addressed adequately in the literature in terms of DBNs for cancer classification. In the present study, we identify the genes that act as regulators and mediate the activity of transcription factors that have been found in all promoters of our differentially expressed gene sets. These features serve as potential priors for distinguishing tumor from normal samples using a DBN-based classification approach. We employed three microarray datasets from the Gene Expression Omnibus (GEO) public functional repository and performed differential expression analysis. Promoter and pathway analysis of the identified genes revealed the key regulators which influence the transcription mechanisms of these genes. We applied the DBN algorithm on selected genes and identified the features that can accurately classify the samples into tumors and controls. Both accuracy and Area Under the Curve (AUC) were high for the gene sets comprising of the differentially expressed genes along with their master regulators (accuracy: 70.8%-98.5%; AUC: 0.562-0.985).
Collapse
Affiliation(s)
- Konstantina Kourou
- Unit of Medical Technology and Intelligent Information Systems, Dept. of Materials Science and Engineering, University of Ioannina, GR 45110, Greece; Dept. of Biological Applications and Technology, University of Ioannina, Ioannina, GR, 45110, Greece
| | - George Rigas
- Unit of Medical Technology and Intelligent Information Systems, Dept. of Materials Science and Engineering, University of Ioannina, GR 45110, Greece
| | - Costas Papaloukas
- Unit of Medical Technology and Intelligent Information Systems, Dept. of Materials Science and Engineering, University of Ioannina, GR 45110, Greece; Dept. of Biological Applications and Technology, University of Ioannina, Ioannina, GR, 45110, Greece
| | - Michalis Mitsis
- Dept. of Surgery and Cancer Biobank Center, School of Health Sciences, Faculty of Medicine, University of Ioannina, 45110, Ioannina, GR 45110, Greece
| | - Dimitrios I Fotiadis
- Unit of Medical Technology and Intelligent Information Systems, Dept. of Materials Science and Engineering, University of Ioannina, GR 45110, Greece; Foundation for Research and Technology-Hellas, Institute of Molecular Biology and Biotechnology, Dept. of Biomedical Research, GR 45110, Greece.
| |
Collapse
|
17
|
Wani N, Raza K. Integrative approaches to reconstruct regulatory networks from multi-omics data: A review of state-of-the-art methods. Comput Biol Chem 2019; 83:107120. [PMID: 31499298 DOI: 10.1016/j.compbiolchem.2019.107120] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Revised: 02/22/2019] [Accepted: 08/27/2019] [Indexed: 02/06/2023]
Abstract
Data generation using high throughput technologies has led to the accumulation of diverse types of molecular data. These data have different types (discrete, real, string, etc.) and occur in various formats and sizes. Datasets including gene expression, miRNA expression, protein-DNA binding data (ChIP-Seq/ChIP-ChIP), mutation data (copy number variation, single nucleotide polymorphisms), annotations, interactions, and association data are some of the commonly used biological datasets to study various cellular mechanisms of living organisms. Each of them provides a unique, complementary and partly independent view of the genome and hence embed essential information about the regulatory mechanisms of genes and their products. Therefore, integrating these data and inferring regulatory interactions from them offer a system level of biological insight in predicting gene functions and their phenotypic outcomes. To study genome functionality through regulatory networks, different methods have been proposed for collective mining of information from an integrated dataset. We survey here integration methods that reconstruct regulatory networks using state-of-the-art techniques to handle multi-omics (i.e., genomic, transcriptomic, proteomic) and other biological datasets.
Collapse
Affiliation(s)
- Nisar Wani
- Govt. Degree College Baramulla, J & K, India; Department of Computer Science, jamia Milia Islamia, New Delhi, India
| | - Khalid Raza
- Department of Computer Science, jamia Milia Islamia, New Delhi, India.
| |
Collapse
|
18
|
Jiang D, Armour CR, Hu C, Mei M, Tian C, Sharpton TJ, Jiang Y. Microbiome Multi-Omics Network Analysis: Statistical Considerations, Limitations, and Opportunities. Front Genet 2019; 10:995. [PMID: 31781153 PMCID: PMC6857202 DOI: 10.3389/fgene.2019.00995] [Citation(s) in RCA: 88] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 09/18/2019] [Indexed: 12/21/2022] Open
Abstract
The advent of large-scale microbiome studies affords newfound analytical opportunities to understand how these communities of microbes operate and relate to their environment. However, the analytical methodology needed to model microbiome data and integrate them with other data constructs remains nascent. This emergent analytical toolset frequently ports over techniques developed in other multi-omics investigations, especially the growing array of statistical and computational techniques for integrating and representing data through networks. While network analysis has emerged as a powerful approach to modeling microbiome data, oftentimes by integrating these data with other types of omics data to discern their functional linkages, it is not always evident if the statistical details of the approach being applied are consistent with the assumptions of microbiome data or how they impact data interpretation. In this review, we overview some of the most important network methods for integrative analysis, with an emphasis on methods that have been applied or have great potential to be applied to the analysis of multi-omics integration of microbiome data. We compare advantages and disadvantages of various statistical tools, assess their applicability to microbiome data, and discuss their biological interpretability. We also highlight on-going statistical challenges and opportunities for integrative network analysis of microbiome data.
Collapse
Affiliation(s)
- Duo Jiang
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| | - Courtney R Armour
- Department of Microbiology, Oregon State University, Corvallis, OR, United States
| | - Chenxiao Hu
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| | - Meng Mei
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| | - Chuan Tian
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| | - Thomas J Sharpton
- Department of Statistics, Oregon State University, Corvallis, OR, United States
- Department of Microbiology, Oregon State University, Corvallis, OR, United States
| | - Yuan Jiang
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| |
Collapse
|
19
|
de Campos LM, Cano A, Castellano JG, Moral S. Combining gene expression data and prior knowledge for inferring gene regulatory networks via Bayesian networks using structural restrictions. Stat Appl Genet Mol Biol 2019; 18:sagmb-2018-0042. [PMID: 31042646 DOI: 10.1515/sagmb-2018-0042] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Gene Regulatory Networks (GRNs) are known as the most adequate instrument to provide a clear insight and understanding of the cellular systems. One of the most successful techniques to reconstruct GRNs using gene expression data is Bayesian networks (BN) which have proven to be an ideal approach for heterogeneous data integration in the learning process. Nevertheless, the incorporation of prior knowledge has been achieved by using prior beliefs or by using networks as a starting point in the search process. In this work, the utilization of different kinds of structural restrictions within algorithms for learning BNs from gene expression data is considered. These restrictions will codify prior knowledge, in such a way that a BN should satisfy them. Therefore, one aim of this work is to make a detailed review on the use of prior knowledge and gene expression data to inferring GRNs from BNs, but the major purpose in this paper is to research whether the structural learning algorithms for BNs from expression data can achieve better outcomes exploiting this prior knowledge with the use of structural restrictions. In the experimental study, it is shown that this new way to incorporate prior knowledge leads us to achieve better reverse-engineered networks.
Collapse
Affiliation(s)
- Luis M de Campos
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Andrés Cano
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Javier G Castellano
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Serafín Moral
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| |
Collapse
|
20
|
Franks AM, Markowetz F, Airoldi EM. REFINING CELLULAR PATHWAY MODELS USING AN ENSEMBLE OF HETEROGENEOUS DATA SOURCES. Ann Appl Stat 2018; 12:1361-1384. [PMID: 36506698 PMCID: PMC9733905 DOI: 10.1214/16-aoas915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Improving current models and hypotheses of cellular pathways is one of the major challenges of systems biology and functional genomics. There is a need for methods to build on established expert knowledge and reconcile it with results of new high-throughput studies. Moreover, the available sources of data are heterogeneous, and the data need to be integrated in different ways depending on which part of the pathway they are most informative for. In this paper, we introduce a compartment specific strategy to integrate edge, node and path data for refining a given network hypothesis. To carry out inference, we use a local-move Gibbs sampler for updating the pathway hypothesis from a compendium of heterogeneous data sources, and a new network regression idea for integrating protein attributes. We demonstrate the utility of this approach in a case study of the pheromone response MAPK pathway in the yeast S. cerevisiae.
Collapse
Affiliation(s)
- Alexander M Franks
- Department of Statistics and, Applied Probability, University of California, Santa Barbara, South Hall, Santa Barbara, California 93106, USA
| | - Florian Markowetz
- Cancer Research UK, Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Robinson Way, Cambridge, CB2 0RE, United Kingdom
| | - Edoardo M Airoldi
- Fox School of Business, Department of Statistical Science, Temple University, Center for Data Science, 1810 Liacouras Walk, Philadelphia, Pennsylvania 19122, USA
| |
Collapse
|
21
|
Bayesian variable selection with graphical structure learning: Applications in integrative genomics. PLoS One 2018; 13:e0195070. [PMID: 30059495 PMCID: PMC6066211 DOI: 10.1371/journal.pone.0195070] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2016] [Accepted: 03/12/2018] [Indexed: 11/19/2022] Open
Abstract
Significant advances in biotechnology have allowed for simultaneous measurement of molecular data across multiple genomic, epigenomic and transcriptomic levels from a single tumor/patient sample. This has motivated systematic data-driven approaches to integrate multi-dimensional structured datasets, since cancer development and progression is driven by numerous co-ordinated molecular alterations and the interactions between them. We propose a novel multi-scale Bayesian approach that combines integrative graphical structure learning from multiple sources of data with a variable selection framework—to determine the key genomic drivers of cancer progression. The integrative structure learning is first accomplished through novel joint graphical models for heterogeneous (mixed scale) data, allowing for flexible and interpretable incorporation of prior existing knowledge. This subsequently informs a variable selection step to identify groups of co-ordinated molecular features within and across platforms associated with clinical outcomes of cancer progression, while according appropriate adjustments for multicollinearity and multiplicities. We evaluate our methods through rigorous simulations to establish superiority over existing methods that do not take the network and/or prior information into account. Our methods are motivated by and applied to a glioblastoma multiforme (GBM) dataset from The Cancer Genome Atlas to predict patient survival times integrating gene expression, copy number and methylation data. We find a high concordance between our selected prognostic gene network modules with known associations with GBM. In addition, our model discovers several novel cross-platform network interactions (both cis and trans acting) between gene expression, copy number variation associated gene dosing and epigenetic regulation through promoter methylation, some with known implications in the etiology of GBM. Our framework provides a useful tool for biomedical researchers, since clinical prediction using multi-platform genomic information is an important step towards personalized treatment of many cancers.
Collapse
|
22
|
Agrahari R, Foroushani A, Docking TR, Chang L, Duns G, Hudoba M, Karsan A, Zare H. Applications of Bayesian network models in predicting types of hematological malignancies. Sci Rep 2018; 8:6951. [PMID: 29725024 PMCID: PMC5934387 DOI: 10.1038/s41598-018-24758-5] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 04/05/2018] [Indexed: 12/17/2022] Open
Abstract
Network analysis is the preferred approach for the detection of subtle but coordinated changes in expression of an interacting and related set of genes. We introduce a novel method based on the analyses of coexpression networks and Bayesian networks, and we use this new method to classify two types of hematological malignancies; namely, acute myeloid leukemia (AML) and myelodysplastic syndrome (MDS). Our classifier has an accuracy of 93%, a precision of 98%, and a recall of 90% on the training dataset (n = 366); which outperforms the results reported by other scholars on the same dataset. Although our training dataset consists of microarray data, our model has a remarkable performance on the RNA-Seq test dataset (n = 74, accuracy = 89%, precision = 88%, recall = 98%), which confirms that eigengenes are robust with respect to expression profiling technology. These signatures are useful in classification and correctly predicting the diagnosis. They might also provide valuable information about the underlying biology of diseases. Our network analysis approach is generalizable and can be useful for classifying other diseases based on gene expression profiles. Our previously published Pigengene package is publicly available through Bioconductor, which can be used to conveniently fit a Bayesian network to gene expression data.
Collapse
Affiliation(s)
- Rupesh Agrahari
- Department of Computer Science, Texas State University, San Marcos, Texas, 78666, USA
| | - Amir Foroushani
- Department of Computer Science, Texas State University, San Marcos, Texas, 78666, USA
| | - T Roderick Docking
- Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 1L3, Canada
| | - Linda Chang
- Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 1L3, Canada
| | - Gerben Duns
- Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 1L3, Canada
| | - Monika Hudoba
- Department of Pathology and Laboratory Medicine, Vancouver General Hospital, Vancouver, British Columbia, V5Z 1M9, Canada
| | - Aly Karsan
- Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 1L3, Canada
| | - Habil Zare
- Department of Computer Science, Texas State University, San Marcos, Texas, 78666, USA. .,Department of Cell Systems & Anatomy, The University of Texas Health Science Center, San Antonio, Texas, 78229, USA.
| |
Collapse
|
23
|
Yu B, Xu JM, Li S, Chen C, Chen RX, Wang L, Zhang Y, Wang MH. Inference of time-delayed gene regulatory networks based on dynamic Bayesian network hybrid learning method. Oncotarget 2017; 8:80373-80392. [PMID: 29113310 PMCID: PMC5655205 DOI: 10.18632/oncotarget.21268] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Accepted: 08/27/2017] [Indexed: 01/31/2023] Open
Abstract
Gene regulatory networks (GRNs) research reveals complex life phenomena from the perspective of gene interaction, which is an important research field in systems biology. Traditional Bayesian networks have a high computational complexity, and the network structure scoring model has a single feature. Information-based approaches cannot identify the direction of regulation. In order to make up for the shortcomings of the above methods, this paper presents a novel hybrid learning method (DBNCS) based on dynamic Bayesian network (DBN) to construct the multiple time-delayed GRNs for the first time, combining the comprehensive score (CS) with the DBN model. DBNCS algorithm first uses CMI2NI (conditional mutual inclusive information-based network inference) algorithm for network structure profiles learning, namely the construction of search space. Then the redundant regulations are removed by using the recursive optimization algorithm (RO), thereby reduce the false positive rate. Secondly, the network structure profiles are decomposed into a set of cliques without loss, which can significantly reduce the computational complexity. Finally, DBN model is used to identify the direction of gene regulation within the cliques and search for the optimal network structure. The performance of DBNCS algorithm is evaluated by the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in Escherichia coli, and compared with other state-of-the-art methods. The experimental results show the rationality of the algorithm design and the outstanding performance of the GRNs.
Collapse
Affiliation(s)
- Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- CAS Key Laboratory of Geospace Environment, Department of Geophysics and Planetary Science, University of Science and Technology of China, Hefei 230026, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Jia-Meng Xu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Shan Li
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Cheng Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Rui-Xin Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Lei Wang
- Key Laboratory of Eco-chemical Engineering, Ministry of Education, Laboratory of Inorganic Synthesis and Applied Chemistry, College of Chemistry and Molecular Engineering, Qingdao University of Science and Technology, Qingdao 266042, China
| | - Yan Zhang
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
- College of Electromechanical Engineering, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Ming-Hui Wang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| |
Collapse
|
24
|
Kourou K, Rigas G, Exarchos KP, Papaloukas C, Fotiadis DI. Prediction of oral cancer recurrence using dynamic Bayesian networks. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2017; 2016:5275-5278. [PMID: 28269454 DOI: 10.1109/embc.2016.7591917] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We propose a methodology for predicting oral cancer recurrence using Dynamic Bayesian Networks. The methodology takes into consideration time series gene expression data collected at the follow-up study of patients that had or had not suffered a disease relapse. Based on that knowledge, our aim is to infer the corresponding dynamic Bayesian networks and subsequently conjecture about the causal relationships among genes within the same time-slice and between consecutive time-slices. Moreover, the proposed methodology aims to (i) assess the prognosis of patients regarding oral cancer recurrence and at the same time, (ii) provide important information about the underlying biological processes of the disease.
Collapse
|
25
|
Kpogbezan GB, van der Vaart AW, van Wieringen WN, Leday GGR, van de Wiel MA. An empirical Bayes approach to network recovery using external knowledge. Biom J 2017; 59:932-947. [PMID: 28393396 DOI: 10.1002/bimj.201600090] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2016] [Revised: 11/22/2016] [Accepted: 12/04/2016] [Indexed: 11/12/2022]
Abstract
Reconstruction of a high-dimensional network may benefit substantially from the inclusion of prior knowledge on the network topology. In the case of gene interaction networks such knowledge may come for instance from pathway repositories like KEGG, or be inferred from data of a pilot study. The Bayesian framework provides a natural means of including such prior knowledge. Based on a Bayesian Simultaneous Equation Model, we develop an appealing Empirical Bayes (EB) procedure that automatically assesses the agreement of the used prior knowledge with the data at hand. We use variational Bayes method for posterior densities approximation and compare its accuracy with that of Gibbs sampling strategy. Our method is computationally fast, and can outperform known competitors. In a simulation study, we show that accurate prior data can greatly improve the reconstruction of the network, but need not harm the reconstruction if wrong. We demonstrate the benefits of the method in an analysis of gene expression data from GEO. In particular, the edges of the recovered network have superior reproducibility (compared to that of competitors) over resampled versions of the data.
Collapse
Affiliation(s)
- Gino B Kpogbezan
- Department of Mathematics, University of Leiden, Niels Bohrweg 1, 2333, CA Leiden, The Netherlands
| | - Aad W van der Vaart
- Department of Mathematics, University of Leiden, Niels Bohrweg 1, 2333, CA Leiden, The Netherlands
| | - Wessel N van Wieringen
- Department of Mathematics, Vrije Universiteit Amsterdam, De Boelelaan 1081, 1081, HV Amsterdam, The Netherlands.,Department of Epidemiology and Biostatistics, VU University Medical Center, 1007, MB, Amsterdam, The Netherlands
| | - Gwenaël G R Leday
- MRC Biostatistics Unit, Cambridge Institute of Public Health, Forvie Site, Cambridge, CB2 0SR, United Kingdom
| | - Mark A van de Wiel
- Department of Mathematics, Vrije Universiteit Amsterdam, De Boelelaan 1081, 1081, HV Amsterdam, The Netherlands.,Department of Epidemiology and Biostatistics, VU University Medical Center, 1007, MB, Amsterdam, The Netherlands
| |
Collapse
|
26
|
Altarawy D, Eid FE, Heath LS. PEAK: Integrating Curated and Noisy Prior Knowledge in Gene Regulatory Network Inference. J Comput Biol 2017; 24:863-873. [PMID: 28294630 DOI: 10.1089/cmb.2016.0199] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
With abundance of biological data, computational prediction of gene regulatory networks (GRNs) from gene expression data has become more feasible. Although incorporating other prior knowledge (PK), along with gene expression data, greatly improves prediction accuracy, the overall accuracy is still low. PK in GRN inference can be categorized into noisy and curated. In noisy PK, relations between genes do not necessarily correspond to regulatory relations and are thus considered inaccurate by inference algorithms such as transcription factor binding and protein-protein interactions. In contrast, curated PK is experimentally verified regulatory interactions in pathway databases. An issue in real data is that gene expression can poorly support the curated PK and thus most existing prediction algorithms cannot use these curated PK. Although several algorithms were proposed to incorporate noisy PK, none address curated PK with poor gene expression support. We present PEAK, a system to integrate both curated and noisy PK in GRN inference, especially with poor gene expression support. We introduce a novel method for GRN inference, CurInf, to effectively integrate curated PK, even when the gene expression data poorly support the PK. PEAK also uses the previously proposed method Modified Elastic Net to incorporate noisy PK, and we call it NoisInf. In our experiment, CurInf significantly incorporates curated PK, which was regarded as noise by previous methods. Using 100% curated PK, CurInf improves the area under precision-recall curve accuracy score over NoisInf by 27.3% in synthetic data, 86.5% in Escherichia coli data, and 31.1% in Saccharomyces cerevisiae data. Moreover, even when the noise in PK is 10 times more than true PK, PEAK performs better than inference without any PK. Better integration of curated PK helps biologists benefit from verified experimental data to predict more reliable GRN.
Collapse
Affiliation(s)
- Doaa Altarawy
- 1 Department of Computer Science, Virginia Tech , Blacksburg, Virginia.,2 Department of Computer and Systems Engineering, Alexandria University , Alexandria, Egypt
| | - Fatma-Elzahraa Eid
- 1 Department of Computer Science, Virginia Tech , Blacksburg, Virginia.,3 Department of Systems and Computer Engineering, Al-Azhar University , Cairo, Egypt
| | - Lenwood S Heath
- 1 Department of Computer Science, Virginia Tech , Blacksburg, Virginia
| |
Collapse
|
27
|
|
28
|
Abstract
Motivation: Markov networks are undirected graphical models that are widely used to infer relations between genes from experimental data. Their state-of-the-art inference procedures assume the data arise from a Gaussian distribution. High-throughput omics data, such as that from next generation sequencing, often violates this assumption. Furthermore, when collected data arise from multiple related but otherwise nonidentical distributions, their underlying networks are likely to have common features. New principled statistical approaches are needed that can deal with different data distributions and jointly consider collections of datasets. Results: We present FuseNet, a Markov network formulation that infers networks from a collection of nonidentically distributed datasets. Our approach is computationally efficient and general: given any number of distributions from an exponential family, FuseNet represents model parameters through shared latent factors that define neighborhoods of network nodes. In a simulation study, we demonstrate good predictive performance of FuseNet in comparison to several popular graphical models. We show its effectiveness in an application to breast cancer RNA-sequencing and somatic mutation data, a novel application of graphical models. Fusion of datasets offers substantial gains relative to inference of separate networks for each dataset. Our results demonstrate that network inference methods for non-Gaussian data can help in accurate modeling of the data generated by emergent high-throughput technologies. Availability and implementation: Source code is at https://github.com/marinkaz/fusenet. Contact:blaz.zupan@fri.uni-lj.si Supplementary information:Supplementary information is available at Bioinformatics online.
Collapse
Affiliation(s)
- Marinka Žitnik
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Blaž Zupan
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| |
Collapse
|
29
|
Alyass A, Turcotte M, Meyre D. From big data analysis to personalized medicine for all: challenges and opportunities. BMC Med Genomics 2015; 8:33. [PMID: 26112054 PMCID: PMC4482045 DOI: 10.1186/s12920-015-0108-y] [Citation(s) in RCA: 240] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Accepted: 06/15/2015] [Indexed: 02/07/2023] Open
Abstract
Recent advances in high-throughput technologies have led to the emergence of systems biology as a holistic science to achieve more precise modeling of complex diseases. Many predict the emergence of personalized medicine in the near future. We are, however, moving from two-tiered health systems to a two-tiered personalized medicine. Omics facilities are restricted to affluent regions, and personalized medicine is likely to widen the growing gap in health systems between high and low-income countries. This is mirrored by an increasing lag between our ability to generate and analyze big data. Several bottlenecks slow-down the transition from conventional to personalized medicine: generation of cost-effective high-throughput data; hybrid education and multidisciplinary teams; data storage and processing; data integration and interpretation; and individual and global economic relevance. This review provides an update of important developments in the analysis of big data and forward strategies to accelerate the global transition to personalized medicine.
Collapse
Affiliation(s)
- Akram Alyass
- Department of Clinical Epidemiology and Biostatistics, McMaster University, 1280 Main Street West, Hamilton, ON, Canada.
| | - Michelle Turcotte
- Department of Clinical Epidemiology and Biostatistics, McMaster University, 1280 Main Street West, Hamilton, ON, Canada.
| | - David Meyre
- Department of Clinical Epidemiology and Biostatistics, McMaster University, 1280 Main Street West, Hamilton, ON, Canada.
- Department of Pathology and Molecular Medicine, McMaster University, 1280 Main Street West, Hamilton, ON, Canada.
| |
Collapse
|
30
|
Liu ZP. Reverse Engineering of Genome-wide Gene Regulatory Networks from Gene Expression Data. Curr Genomics 2015; 16:3-22. [PMID: 25937810 PMCID: PMC4412962 DOI: 10.2174/1389202915666141110210634] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2014] [Revised: 09/05/2014] [Accepted: 09/05/2014] [Indexed: 12/17/2022] Open
Abstract
Transcriptional regulation plays vital roles in many fundamental biological processes. Reverse engineering of genome-wide regulatory networks from high-throughput transcriptomic data provides a promising way to characterize the global scenario of regulatory relationships between regulators and their targets. In this review, we summarize and categorize the main frameworks and methods currently available for inferring transcriptional regulatory networks from microarray gene expression profiling data. We overview each of strategies and introduce representative methods respectively. Their assumptions, advantages, shortcomings, and possible improvements and extensions are also clarified and commented.
Collapse
Affiliation(s)
- Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| |
Collapse
|
31
|
Durmuş S, Çakır T, Özgür A, Guthke R. A review on computational systems biology of pathogen-host interactions. Front Microbiol 2015; 6:235. [PMID: 25914674 PMCID: PMC4391036 DOI: 10.3389/fmicb.2015.00235] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Accepted: 03/10/2015] [Indexed: 12/27/2022] Open
Abstract
Pathogens manipulate the cellular mechanisms of host organisms via pathogen-host interactions (PHIs) in order to take advantage of the capabilities of host cells, leading to infections. The crucial role of these interspecies molecular interactions in initiating and sustaining infections necessitates a thorough understanding of the corresponding mechanisms. Unlike the traditional approach of considering the host or pathogen separately, a systems-level approach, considering the PHI system as a whole is indispensable to elucidate the mechanisms of infection. Following the technological advances in the post-genomic era, PHI data have been produced in large-scale within the last decade. Systems biology-based methods for the inference and analysis of PHI regulatory, metabolic, and protein-protein networks to shed light on infection mechanisms are gaining increasing demand thanks to the availability of omics data. The knowledge derived from the PHIs may largely contribute to the identification of new and more efficient therapeutics to prevent or cure infections. There are recent efforts for the detailed documentation of these experimentally verified PHI data through Web-based databases. Despite these advances in data archiving, there are still large amounts of PHI data in the biomedical literature yet to be discovered, and novel text mining methods are in development to unearth such hidden data. Here, we review a collection of recent studies on computational systems biology of PHIs with a special focus on the methods for the inference and analysis of PHI networks, covering also the Web-based databases and text-mining efforts to unravel the data hidden in the literature.
Collapse
Affiliation(s)
- Saliha Durmuş
- Computational Systems Biology Group, Department of Bioengineering, Gebze Technical University, KocaeliTurkey
| | - Tunahan Çakır
- Computational Systems Biology Group, Department of Bioengineering, Gebze Technical University, KocaeliTurkey
| | - Arzucan Özgür
- Department of Computer Engineering, Boǧaziçi University, IstanbulTurkey
| | - Reinhard Guthke
- Leibniz Institute for Natural Product Research and Infection Biology – Hans-Knoell-Institute, JenaGermany
| |
Collapse
|
32
|
Linde J, Schulze S, Henkel SG, Guthke R. Data- and knowledge-based modeling of gene regulatory networks: an update. EXCLI JOURNAL 2015; 14:346-78. [PMID: 27047314 PMCID: PMC4817425 DOI: 10.17179/excli2015-168] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Accepted: 02/10/2015] [Indexed: 02/01/2023]
Abstract
Gene regulatory network inference is a systems biology approach which predicts interactions between genes with the help of high-throughput data. In this review, we present current and updated network inference methods focusing on novel techniques for data acquisition, network inference assessment, network inference for interacting species and the integration of prior knowledge. After the advance of Next-Generation-Sequencing of cDNAs derived from RNA samples (RNA-Seq) we discuss in detail its application to network inference. Furthermore, we present progress for large-scale or even full-genomic network inference as well as for small-scale condensed network inference and review advances in the evaluation of network inference methods by crowdsourcing. Finally, we reflect the current availability of data and prior knowledge sources and give an outlook for the inference of gene regulatory networks that reflect interacting species, in particular pathogen-host interactions.
Collapse
Affiliation(s)
- Jörg Linde
- Research Group Systems Biology / Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| | - Sylvie Schulze
- Research Group Systems Biology / Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| | | | - Reinhard Guthke
- Research Group Systems Biology / Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| |
Collapse
|