1
|
Alali M, Imani M. Deep Reinforcement Learning Data Collection for Bayesian Inference of Hidden Markov Models. IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE 2025; 6:1217-1232. [PMID: 40313356 PMCID: PMC12045110 DOI: 10.1109/tai.2024.3515939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2025]
Abstract
Hidden Markov Models (HMMs) are a powerful class of dynamical models for representing complex systems that are partially observed through sensory data. Existing data collection methods for HMMs, typically based on active learning or heuristic approaches, face challenges in terms of efficiency in stochastic domains with costly data. This paper introduces a Bayesian lookahead data collection method for inferring HMMs with finite state and parameter spaces. The method optimizes data collection under uncertainty using a belief state that captures the joint distribution of system states and models. Unlike traditional approaches that prioritize short-term gains, this policy accounts for the long-term impact of data collection decisions to improve inference performance over time. We develop a deep reinforcement learning policy that approximates the optimal Bayesian solution by simulating system trajectories offline. This pre-trained policy can be executed in real-time, dynamically adapting to new conditions as data is collected. The proposed framework supports a wide range of inference objectives, including point-based, distribution-based, and causal inference. Experimental results across three distinct systems demonstrate significant improvements in inference accuracy and robustness, showcasing the effectiveness of the approach in uncertain and data-limited environments.
Collapse
Affiliation(s)
- Mohammad Alali
- Department of Electrical and Computer Engineering at Northeastern University
| | - Mahdi Imani
- Department of Electrical and Computer Engineering at Northeastern University
| |
Collapse
|
2
|
Hua W, Cui R, Yang H, Zhang J, Liu C, Sun J. Uncovering critical transitions and molecule mechanisms in disease progressions using Gaussian graphical optimal transport. Commun Biol 2025; 8:575. [PMID: 40189710 PMCID: PMC11973219 DOI: 10.1038/s42003-025-07995-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Accepted: 03/25/2025] [Indexed: 04/09/2025] Open
Abstract
Understanding disease progression is crucial for detecting critical transitions and finding trigger molecules, facilitating early diagnosis interventions. However, the high dimensionality of data and the lack of aligned samples across disease stages have posed challenges in addressing these tasks. We present a computational framework, Gaussian Graphical Optimal Transport (GGOT), for analyzing disease progressions. The proposed GGOT uses Gaussian graphical models, incorporating protein interaction networks, to characterize the data distributions at different disease stages. Then we use population-level optimal transport to calculate the Wasserstein distances and transport between stages, enabling us to detect critical transitions. By analyzing the per-molecule transport distance, we quantify the importance of each molecule and identify trigger molecules. Moreover, GGOT predicts the occurrence of critical transitions in unseen samples and visualizes the disease progression process. We apply GGOT to the simulation dataset and six disease datasets with varying disease progression rates to substantiate its effectiveness. Compared to existing methods, our proposed GGOT exhibits superior performance in detecting critical transitions.
Collapse
Affiliation(s)
- Wenbo Hua
- School of Mathematics and Statistics, Xi'an Jiaotong University, No.28 Xianning West Rd., Xi'an, 710049, Shaanxi, China
| | - Ruixia Cui
- Key Laboratory of Surgical Critical Care and Life Support (Xi'an Jiaotong University), Ministry of Education, No.28 Xianning West Rd., Xi'an, 710049, Shaanxi, China
- Department of Hepatobiliary Surgery and Liver Transplantation, The Second Affiliated Hospital of Xi'an Jiaotong University, No.154 West 5th Rd., Xi'an, 710004, Shaanxi, China
| | - Heran Yang
- School of Mathematics and Statistics, Xi'an Jiaotong University, No.28 Xianning West Rd., Xi'an, 710049, Shaanxi, China
| | - Jingyao Zhang
- Key Laboratory of Surgical Critical Care and Life Support (Xi'an Jiaotong University), Ministry of Education, No.28 Xianning West Rd., Xi'an, 710049, Shaanxi, China
- Department of SICU, The First Affiliated Hospital of Xi'an Jiaotong University, No.227 Yanta West Rd., Xi'an, 710061, Shaanxi, China
| | - Chang Liu
- Key Laboratory of Surgical Critical Care and Life Support (Xi'an Jiaotong University), Ministry of Education, No.28 Xianning West Rd., Xi'an, 710049, Shaanxi, China.
- Department of Hepatobiliary Surgery and Liver Transplantation, The Second Affiliated Hospital of Xi'an Jiaotong University, No.154 West 5th Rd., Xi'an, 710004, Shaanxi, China.
| | - Jian Sun
- School of Mathematics and Statistics, Xi'an Jiaotong University, No.28 Xianning West Rd., Xi'an, 710049, Shaanxi, China.
| |
Collapse
|
3
|
Alali M, Kazeminajafabadi A, Imani M. Deep Reinforcement Learning Sensor Scheduling for Effective Monitoring of Dynamical Systems. SYSTEMS SCIENCE & CONTROL ENGINEERING 2024; 12:2329260. [PMID: 38680720 PMCID: PMC11044865 DOI: 10.1080/21642583.2024.2329260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 03/04/2024] [Indexed: 05/01/2024]
Abstract
Advances in technology have enabled the use of sensors with varied modalities to monitor different parts of systems, each providing diverse levels of information about the underlying system. However, resource limitations and computational power restrict the number of sensors/data that can be processed in real-time in most complex systems. These challenges necessitate the need for selecting/scheduling a subset of sensors to obtain measurements that guarantee the best monitoring objectives. This paper focuses on sensor scheduling for systems modeled by hidden Markov models. Despite the development of several sensor selection and scheduling methods, existing methods tend to be greedy and do not take into account the long-term impact of selected sensors on monitoring objectives. This paper formulates optimal sensor scheduling as a reinforcement learning problem defined over the posterior distribution of system states. Further, the paper derives a deep reinforcement learning policy for offline learning of the sensor scheduling policy, which can then be executed in real-time as new information unfolds. The proposed method applies to any monitoring objective that can be expressed in terms of the posterior distribution of the states (e.g., state estimation, information gain, etc.). The performance of the proposed method in terms of accuracy and robustness is investigated for monitoring the security of networked systems and the health monitoring of gene regulatory networks.
Collapse
Affiliation(s)
- Mohammad Alali
- Northeastern University, 360 Huntington Ave, Boston, MA, 02115, U.S
| | | | - Mahdi Imani
- Northeastern University, 360 Huntington Ave, Boston, MA, 02115, U.S
| |
Collapse
|
4
|
Li R, Rozum JC, Quail MM, Qasim MN, Sindi SS, Nobile CJ, Albert R, Hernday AD. Inferring gene regulatory networks using transcriptional profiles as dynamical attractors. PLoS Comput Biol 2023; 19:e1010991. [PMID: 37607190 PMCID: PMC10473541 DOI: 10.1371/journal.pcbi.1010991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 09/01/2023] [Accepted: 07/19/2023] [Indexed: 08/24/2023] Open
Abstract
Genetic regulatory networks (GRNs) regulate the flow of genetic information from the genome to expressed messenger RNAs (mRNAs) and thus are critical to controlling the phenotypic characteristics of cells. Numerous methods exist for profiling mRNA transcript levels and identifying protein-DNA binding interactions at the genome-wide scale. These enable researchers to determine the structure and output of transcriptional regulatory networks, but uncovering the complete structure and regulatory logic of GRNs remains a challenge. The field of GRN inference aims to meet this challenge using computational modeling to derive the structure and logic of GRNs from experimental data and to encode this knowledge in Boolean networks, Bayesian networks, ordinary differential equation (ODE) models, or other modeling frameworks. However, most existing models do not incorporate dynamic transcriptional data since it has historically been less widely available in comparison to "static" transcriptional data. We report the development of an evolutionary algorithm-based ODE modeling approach (named EA) that integrates kinetic transcription data and the theory of attractor matching to infer GRN architecture and regulatory logic. Our method outperformed six leading GRN inference methods, none of which incorporate kinetic transcriptional data, in predicting regulatory connections among TFs when applied to a small-scale engineered synthetic GRN in Saccharomyces cerevisiae. Moreover, we demonstrate the potential of our method to predict unknown transcriptional profiles that would be produced upon genetic perturbation of the GRN governing a two-state cellular phenotypic switch in Candida albicans. We established an iterative refinement strategy to facilitate candidate selection for experimentation; the experimental results in turn provide validation or improvement for the model. In this way, our GRN inference approach can expedite the development of a sophisticated mathematical model that can accurately describe the structure and dynamics of the in vivo GRN.
Collapse
Affiliation(s)
- Ruihao Li
- Quantitative and Systems Biology Graduate Program, University of California, Merced, Merced, California, United States of America
| | - Jordan C. Rozum
- Department of Systems Science and Industrial Engineering, Binghamton University (State University of New York), Binghamton, New York, United States of America
| | - Morgan M. Quail
- Quantitative and Systems Biology Graduate Program, University of California, Merced, Merced, California, United States of America
| | - Mohammad N. Qasim
- Quantitative and Systems Biology Graduate Program, University of California, Merced, Merced, California, United States of America
| | - Suzanne S. Sindi
- Department of Applied Mathematics, University of California, Merced, Merced, California, United States of America
| | - Clarissa J. Nobile
- Department of Molecular Cell Biology, University of California, Merced, Merced, California, United States of America
- Health Sciences Research Institute, University of California, Merced, Merced, California, United States of America
| | - Réka Albert
- Department of Physics, Pennsylvania State University, University Park, University Park, Pennsylvania, United States of America
- Department of Biology, Pennsylvania State University, University Park, University Park, Pennsylvania, United States of America
| | - Aaron D. Hernday
- Department of Molecular Cell Biology, University of California, Merced, Merced, California, United States of America
- Health Sciences Research Institute, University of California, Merced, Merced, California, United States of America
| |
Collapse
|
5
|
Martins YC, Ziviani A, Cerqueira e Costa MDO, Cavalcanti MCR, Nicolás MF, de Vasconcelos ATR. PPIntegrator: semantic integrative system for protein-protein interaction and application for host-pathogen datasets. BIOINFORMATICS ADVANCES 2023; 3:vbad067. [PMID: 37359724 PMCID: PMC10290227 DOI: 10.1093/bioadv/vbad067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 04/28/2023] [Accepted: 05/30/2023] [Indexed: 06/28/2023]
Abstract
Summary Semantic web standards have shown importance in the last 20 years in promoting data formalization and interlinking between the existing knowledge graphs. In this context, several ontologies and data integration initiatives have emerged in recent years for the biological area, such as the broadly used Gene Ontology that contains metadata to annotate gene function and subcellular location. Another important subject in the biological area is protein-protein interactions (PPIs) which have applications like protein function inference. Current PPI databases have heterogeneous exportation methods that challenge their integration and analysis. Presently, several initiatives of ontologies covering some concepts of the PPI domain are available to promote interoperability across datasets. However, the efforts to stimulate guidelines for automatic semantic data integration and analysis for PPIs in these datasets are limited. Here, we present PPIntegrator, a system that semantically describes data related to protein interactions. We also introduce an enrichment pipeline to generate, predict and validate new potential host-pathogen datasets by transitivity analysis. PPIntegrator contains a data preparation module to organize data from three reference databases and a triplification and data fusion module to describe the provenance information and results. This work provides an overview of the PPIntegrator system applied to integrate and compare host-pathogen PPI datasets from four bacterial species using our proposed transitivity analysis pipeline. We also demonstrated some critical queries to analyze this kind of data and highlight the importance and usage of the semantic data generated by our system. Availability and implementation https://github.com/YasCoMa/ppintegrator, https://github.com/YasCoMa/ppi_validation_process and https://github.com/YasCoMa/predprin.
Collapse
Affiliation(s)
- Yasmmin Côrtes Martins
- Bioinformatics Laboratory, National Laboratory for Scientific Computing, Petrópolis 25651-076, Brazil
| | - Artur Ziviani
- Data Extreme Laboratory (DEXL), National Laboratory for Scientific Computing, Petrópolis 25651-076, Brazil
| | | | | | - Marisa Fabiana Nicolás
- Bioinformatics Laboratory, National Laboratory for Scientific Computing, Petrópolis 25651-076, Brazil
| | | |
Collapse
|
6
|
Up-Regulated Proteins Have More Protein-Protein Interactions than Down-Regulated Proteins. Protein J 2022; 41:591-595. [PMID: 36221012 PMCID: PMC9552713 DOI: 10.1007/s10930-022-10081-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/01/2022] [Indexed: 11/11/2022]
Abstract
Microarray technology has been successfully used in many biology studies to solve the protein–protein interaction (PPI) prediction computationally. For normal tissue, the cell regulation process begins with transcription and ends with the translation process. However, when cell regulation activity goes wrong, cancer occurs. Microarray data can precisely give high accuracy expression levels at normal and cancer-affected cells, which can be useful for the identification of disease-related genes. First, the differentially expressed genes (DEGs) are extracted from the cancer microarray dataset in order to identify the genes that are up-regulated and down-regulated during cancer progression in the human body. Then, proteins corresponding to these genes are collected from NCBI, and then the STRING web server is used to build the PPI network of these proteins. Interestingly, up-regulated proteins have always a higher number of PPIs compared to down-regulated proteins, although, in most of the datasets, the majority of these DEGs are down-regulated. We hope this study will help to build a relevant model to analyze the process of cancer progression in the human body.
Collapse
|