1
|
Markov decision process design: A framework for integrating strategic and operational decisions. OPERATIONS RESEARCH LETTERS 2024; 54:107090. [PMID: 38560724 PMCID: PMC10979703 DOI: 10.1016/j.orl.2024.107090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
We consider the problem of optimally designing a system for repeated use under uncertainty. We develop a modeling framework that integrates the design and operational phases, which are represented by a mixed-integer program and discounted-cost infinite-horizon Markov decision processes, respectively. We seek to simultaneously minimize the design costs and the subsequent expected operational costs. This problem setting arises naturally in several application areas, as we illustrate through examples. We derive a bilevel mixed-integer linear programming formulation for the problem and perform a computational study to demonstrate that realistic instances can be solved numerically.
Collapse
|
2
|
A Data-Driven Framework for Clinical Decision Support Systems in Positive Airway Pressure and Oxygen Titration. J Clin Med 2024; 13:757. [PMID: 38337451 PMCID: PMC10856483 DOI: 10.3390/jcm13030757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 01/22/2024] [Accepted: 01/25/2024] [Indexed: 02/12/2024] Open
Abstract
BACKGROUND Current obstructive sleep apnea treatment relies on manual PAP titration, but it has limitations. Complex interactions during titration and variations in SpO2 data accuracy pose challenges. Patients with co-occurring chronic hypercapnia may require precise oxygen titration. To address these issues, we propose a Clinical Decision Support System using Markov decision processes. METHODS This study, compliant with data protection laws, focused on adults with OSA-induced hypoxemia utilizing supplemental oxygen and CPAP/BiPAP therapy. PAP titration, conducted over one night, involved vigilant monitoring of vital signs and physiological parameters. Adjustments to CPAP pressure, potential BiLevel transitions, and supplemental oxygen were precisely guided by patient metrics. Markov decision processes outlined three treatment actions for disorder management, incorporating expert medical insights. RESULTS In our study involving 14 OSA patients (average age: 63 years, 27% females, BMI 41 kg m-2), significant improvements were observed in key health parameters after manual titration. The initial AHI of 61.8 events per hour significantly decreased to an average of 18.0 events per hour after PAP and oxygen titration (p < 0.0001), indicating a substantial reduction in sleep-disordered breathing severity. Concurrently, SpO2 levels increased significantly from an average of 79.7% before titration to 89.1% after titration (p < 0.0003). Pearson correlation coefficients demonstrated aggravation of hypercapnia in 50% of patients (N = 5) with initial pCO2 < 55 mmHg during the increase in CPAP pressure. However, transitioning to BiPAP exhibited a reduction in pCO2 levels, showcasing its efficacy in addressing hypercapnia. Simultaneously, BiPAP therapy correlated with a substantial increase in SpO2, underscoring its positive impact on oxygenation in OSA patients. Markov Decision Process analysis demonstrated realistic patient behavior during stable night conditions, emphasizing minimal apnea and good toleration to high CPAP pressure. CONCLUSIONS The development of a framework for Markov decision processes of PAP and oxygen titration algorithms holds promise for providing algorithms for improving pCO2 and SpO2 values. While challenges remain, including the need for high-quality data, the potential benefits in terms of patient management and care optimization are substantial, and this approach represents an exciting frontier in the realm of telemedicine and respiratory healthcare.
Collapse
|
3
|
CCGN: Centralized collaborative graphical transformer multi-agent reinforcement learning for multi-intersection signal free-corridor. Neural Netw 2023; 166:396-409. [PMID: 37549608 DOI: 10.1016/j.neunet.2023.07.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 06/01/2023] [Accepted: 07/17/2023] [Indexed: 08/09/2023]
Abstract
Tackling traffic signal control through multi-agent reinforcement learning is a widely-employed approach. However, current state-of-the-art models have drawbacks: intersections optimize their own local rewards and cause traffic to waste time and fuel with a start-stop mode at each intersection. They also lack information sharing among intersections and their specialized policy hinders the ability to adapt to new traffic scenarios. To overcome these limitations, This work presents a centralized collaborative graph network (CCGN) with the core objective of a signal-free corridor once the traffic flows have waited at the entry intersection of the traffic intersection network on either side, the subsequent intersection gives the open signal as the traffic flows arrive. CCGN combines local policy networks (LPN) and global policy networks, where LPN employed at each intersection predicts actions based on Transformer and Graph Convolutional Network (GCN). In contrast, GPN is based on GCN and Q-network that receives the LPN states, traffic flow and road information to manage intersections to provide a signal-free corridor. We developed the Deep Graph Convolution Q-Network (DGCQ) by combining Deep Q-Network (DQN) and GCN to achieve a signal-free corridor. DGCQ leverages GCN's intersection collaboration and DQN's information aggregation for traffic control decisions Proposed CCGN model is trained on the robust synthetic traffic network and evaluated on the real-world traffic networks that outperform the other state-of-the-art models.
Collapse
|
4
|
Distorted probability operator for dynamic portfolio optimization in times of socio-economic crisis. CENTRAL EUROPEAN JOURNAL OF OPERATIONS RESEARCH 2022; 31:1-18. [PMID: 36531521 PMCID: PMC9734642 DOI: 10.1007/s10100-022-00834-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 11/17/2022] [Indexed: 06/17/2023]
Abstract
A robust optimal control of discrete time Markov chains with finite terminal T and bounded costs or wealth using probability distortion is studied. The time inconsistency of these distortion operators and hence its lack of dynamic programming are discussed. Due to that, dynamic versions of these operators are introduced, and its availability for dynamic programming is demonstrated. Based on dynamic programming algorithm, existence of the optimal policy is justified and an application of the theory to portfolio optimization along with a numerical study is also presented.
Collapse
|
5
|
Toward automatic motivator selection for autism behavior intervention therapy. UNIVERSAL ACCESS IN THE INFORMATION SOCIETY 2022; 22:1-23. [PMID: 36160369 PMCID: PMC9483340 DOI: 10.1007/s10209-022-00914-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 08/30/2022] [Indexed: 06/16/2023]
Abstract
Children with autism spectrum disorder (ASD) usually show little interest in academic activities and may display disruptive behavior when presented with assignments. Research indicates that incorporating motivational variables during interventions results in improvements in behavior and academic performance. However, the impact of such motivational variables varies between children. In this paper, we aim to address the problem of selecting the right motivator for children with ASD using reinforcement learning by adapting to the most influential factors impacting the effectiveness of the contingent motivator used. We model the task of selecting a motivator as a Markov decision process problem. The states, actions and rewards design consider the factors that impact the effectiveness of a motivator based on applied behavior analysis as well as learners' individual preferences. We use a Q-learning algorithm to solve the modeled problem. Our proposed solution is then implemented as a mobile application developed for special education plans coordination. To evaluate the motivator selection feature, we conduct a study involving a group of teachers and therapists and assess how the added feature aids the participants in their decision-making process of selecting a motivator. Preliminary results indicated that the motivator selection feature improved the usability of the mobile app. Analysis of the algorithm performance showed promising results and indicated improvement of the recommendations over time.
Collapse
|
6
|
A new perspective on breast cancer diagnostic guidelines to reduce overdiagnosis. PRODUCTION AND OPERATIONS MANAGEMENT 2022; 31:2361-2378. [PMID: 35915601 PMCID: PMC9313854 DOI: 10.1111/poms.13691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 01/19/2022] [Indexed: 06/15/2023]
Abstract
Overdiagnosis of breast cancer, defined as diagnosing a cancer that would otherwise not cause symptoms or death in a patient's lifetime, costs U.S. health care system over $1.2 billion annually. Overdiagnosis rates, estimated to be around 10%-40%, may be reduced if indolent breast findings can be identified and followed with noninvasive imaging rather than biopsy. However, there are no validated guidelines for radiologists to decide when to choose imaging options recognizing cancer grades and types. The aim of this study is to optimize breast cancer diagnostic decisions based on cancer types using a large-scale finite-horizon Markov decision process (MDP) model with 4.6 million states to help reduce overdiagnosis. We prove the optimality of a divide-and-search algorithm that relies on tight upper bounds on the optimal decision thresholds to find an exact optimal solution. We project the high-dimensional MDP onto two lower dimensional MDPs and obtain feasible upper bounds on the optimal decision thresholds. We use real data from two private mammography databases and demonstrate our model performance through a previously validated simulation model that has been used by the policy makers to set the national screening guidelines in the United States. We find that a decision-analytical framework optimizing diagnostic decisions while accounting for breast cancer types has a strong potential to improve the quality of life and alleviate the immense costs of overdiagnosis. Our model leads to a 20 % reduction in overdiagnosis on the screening population, which translates into an annual savings of approximately $300 million for the U.S. health care system.
Collapse
|
7
|
Teaching Multiple Inverse Reinforcement Learners. Front Artif Intell 2021; 4:625183. [PMID: 34604737 PMCID: PMC8482012 DOI: 10.3389/frai.2021.625183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Accepted: 02/24/2021] [Indexed: 12/02/2022] Open
Abstract
In this paper, we propose the first machine teaching algorithm for multiple inverse reinforcement learners. As our initial contribution, we formalize the problem of optimally teaching a sequential task to a heterogeneous class of learners. We then contribute a theoretical analysis of such problem, identifying conditions under which it is possible to conduct such teaching using the same demonstration for all learners. Our analysis shows that, contrary to other teaching problems, teaching a sequential task to a heterogeneous class of learners with a single demonstration may not be possible, as the differences between individual agents increase. We then contribute two algorithms that address the main difficulties identified by our theoretical analysis. The first algorithm, which we dub SplitTeach, starts by teaching the class as a whole until all students have learned all that they can learn as a group; it then teaches each student individually, ensuring that all students are able to perfectly acquire the target task. The second approach, which we dub JointTeach, selects a single demonstration to be provided to the whole class so that all students learn the target task as well as a single demonstration allows. While SplitTeach ensures optimal teaching at the cost of a bigger teaching effort, JointTeach ensures minimal effort, although the learners are not guaranteed to perfectly recover the target task. We conclude by illustrating our methods in several simulation domains. The simulation results agree with our theoretical findings, showcasing that indeed class teaching is not possible in the presence of heterogeneous students. At the same time, they also illustrate the main properties of our proposed algorithms: in all domains, SplitTeach guarantees perfect teaching and, in terms of teaching effort, is always at least as good as individualized teaching (often better); on the other hand, JointTeach attains minimal teaching effort in all domains, even if sometimes it compromises the teaching performance.
Collapse
|
8
|
Dynamic capacity allocation in a radiology service considering different types of patients, individual no-show probabilities, and overbooking. BMC Health Serv Res 2021; 21:968. [PMID: 34521414 PMCID: PMC8442351 DOI: 10.1186/s12913-021-06918-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 08/19/2021] [Indexed: 11/22/2022] Open
Abstract
Background We propose a mathematical model formulated as a finite-horizon Markov Decision Process (MDP) to allocate capacity in a radiology department that serves different types of patients. To the best of our knowledge, this is the first attempt at considering radiology resources with different capacities and individual no-show probabilities of ambulatory patients in an MDP model. To mitigate the negative impacts of no-show, overbooking rules are also investigated. Methods The model’s main objective is to identify an optimal policy for allocating the available capacity such that waiting, overtime, and penalty costs are minimized. Optimization is carried out using traditional dynamic programming (DP). The model was applied to real data from a radiology department of a large Brazilian public hospital. The optimal policy is compared with five alternative policies, one of which resembles the one currently used by the department. We identify among alternative policies the one that performs closest to the optimal. Results The optimal policy presented the best performance (smallest total daily cost) in the majority of analyzed scenarios (212 out of 216). Numerical analyses allowed us to recommend the use of the optimal policy for capacity allocation with a double overbooking rule and two resources available in overtime periods. An alternative policy in which outpatients are prioritized for service (rather than inpatients) displayed results closest to the optimal policy, being also recommended due to its easy implementation. Conclusions Based on such recommendation and observing the state of the system at any given period (representing the number of patients waiting for service), radiology department managers should be able to make a decision (i.e., define number and type of patients) that should be selected for service such that the system’s cost is minimized. Supplementary Information The online version contains supplementary material available at 10.1186/s12913-021-06918-y.
Collapse
|
9
|
On incorporating forecasts into linear state space model Markov decision processes. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2021; 379:20190430. [PMID: 34092099 PMCID: PMC8182152 DOI: 10.1098/rsta.2019.0430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 02/08/2021] [Indexed: 06/12/2023]
Abstract
Weather forecast information will very likely find increasing application in the control of future energy systems. In this paper, we introduce an augmented state space model formulation with linear dynamics, within which one can incorporate forecast information that is dynamically revealed alongside the evolution of the underlying state variable. We use the martingale model for forecast evolution (MMFE) to enforce the necessary consistency properties that must govern the joint evolution of forecasts with the underlying state. The formulation also generates jointly Markovian dynamics that give rise to Markov decision processes (MDPs) that remain computationally tractable. This paper is the first to enforce MMFE consistency requirements within an MDP formulation that preserves tractability. This article is part of the theme issue 'The mathematics of energy systems'.
Collapse
|
10
|
Dynamic Programming for Resource Allocation in Multi-Allelic Trait Introgression. FRONTIERS IN PLANT SCIENCE 2021; 12:544854. [PMID: 34220873 PMCID: PMC8253225 DOI: 10.3389/fpls.2021.544854] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Accepted: 05/21/2021] [Indexed: 06/13/2023]
Abstract
Trait introgression is a complex process that plant breeders use to introduce desirable alleles from one variety or species to another. Two of the major types of decisions that must be made during this sophisticated and uncertain workflow are: parental selection and resource allocation. We formulated the trait introgression problem as an engineering process and proposed a Markov Decision Processes (MDP) model to optimize the resource allocation procedure. The efficiency of the MDP model was compared with static resource allocation strategies and their trade-offs among budget, deadline, and probability of success are demonstrated. Simulation results suggest that dynamic resource allocation strategies from the MDP model significantly improve the efficiency of the trait introgression by allocating the right amount of resources according to the genetic outcome of previous generations.
Collapse
|
11
|
Learning and Planning for Time-Varying MDPs Using Maximum Likelihood Estimation. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2021; 22:1-40. [PMID: 35002545 PMCID: PMC8739185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This paper proposes a formal approach to online learning and planning for agents operating in a priori unknown, time-varying environments. The proposed method computes the maximally likely model of the environment, given the observations about the environment made by an agent earlier in the system run and assuming knowledge of a bound on the maximal rate of change of system dynamics. Such an approach generalizes the estimation method commonly used in learning algorithms for unknown Markov decision processes with time-invariant transition probabilities, but is also able to quickly and correctly identify the system dynamics following a change. Based on the proposed method, we generalize the exploration bonuses used in learning for time-invariant Markov decision processes by introducing a notion of uncertainty in a learned time-varying model, and develop a control policy for time-varying Markov decision processes based on the exploitation and exploration trade-off. We demonstrate the proposed methods on four numerical examples: a patrolling task with a change in system dynamics, a two-state MDP with periodically changing outcomes of actions, a wind flow estimation task, and a multi-armed bandit problem with periodically changing probabilities of different rewards.
Collapse
|
12
|
Assessing Mathematics Misunderstandings via Bayesian Inverse Planning. Cogn Sci 2020; 44:e12900. [PMID: 33063866 DOI: 10.1111/cogs.12900] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Revised: 07/14/2020] [Accepted: 07/21/2020] [Indexed: 12/01/2022]
Abstract
Online educational technologies offer opportunities for providing individualized feedback and detailed profiles of students' skills. Yet many technologies for mathematics education assess students based only on the correctness of either their final answers or responses to individual steps. In contrast, examining the choices students make for how to solve the equation and the ways in which they might answer incorrectly offers the opportunity to obtain a more nuanced perspective of their algebra skills. To automatically make sense of step-by-step solutions, we propose a Bayesian inverse planning model for equation solving that computes an assessment of a learner's skills based on her pattern of errors in individual steps and her choices about what sequence of problem-solving steps to take. Bayesian inverse planning builds on existing machine learning tools to create a generative model relating (mis)-understandings to equation solving choices. Two behavioral experiments demonstrate that the model can interpret people's equation solving and that its assessments are consistent with those of experienced teachers. A third experiment uses this model to tailor guidance for learners based on individual differences in misunderstandings, closing the loop between assessing understanding, and using that assessment within an educational technology. Finally, because the bottleneck in applying inverse planning to a new domain is in creating the model of possible student misunderstandings, we show how to combine inverse planning with an existing production rule model to make inferences about student misunderstandings of fraction arithmetic.
Collapse
|
13
|
Analysis of Mammography Screening Schedules under Varying Resource Constraints for Planning Breast Cancer Control Programs in Low- and Middle-Income Countries: A Mathematical Study. Med Decis Making 2020; 40:364-378. [PMID: 32160823 DOI: 10.1177/0272989x20910724] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Background. Low-and-middle-income countries (LMICs) have higher mortality-to-incidence ratio for breast cancer compared to high-income countries (HICs) because of late-stage diagnosis. Mammography screening is recommended for early diagnosis, however, the infrastructure capacity in LMICs are far below that needed for adopting current screening guidelines. Current guidelines are extrapolations from HICs, as limited data had restricted model development specific to LMICs, and thus, economic analysis of screening schedules specific to infrastructure capacities are unavailable. Methods. We applied a new Markov process method for developing cancer progression models and a Markov decision process model to identify optimal screening schedules under a varying number of lifetime screenings per person, a proxy for infrastructure capacity. We modeled Peru, a middle-income country, as a case study and the United States, an HIC, for validation. Results. Implementing 2, 5, 10, and 15 lifetime screens would require about 55, 135, 280, and 405 mammography machines, respectively, and would save 31, 62, 95, and 112 life-years per 1000 women, respectively. Current guidelines recommend 15 lifetime screens, but Peru has only 55 mammography machines nationally. With this capacity, the best strategy is 2 lifetime screenings at age 50 and 56 years. As infrastructure is scaled up to accommodate 5 and 10 lifetime screens, screening between the ages of 44-61 and 41-64 years, respectively, would have the best impact. Our results for the United States are consistent with other models and current guidelines. Limitations. The scope of our model is limited to analysis of national-level guidelines. We did not model heterogeneity across the country. Conclusions. Country-specific optimal screening schedules under varying infrastructure capacities can systematically guide development of cancer control programs and planning of health investments.
Collapse
|
14
|
Spatial Distribution of Eye-Movements After Central Vision Loss is Consistent with an Optimal Visual Search Strategy. Int J Neural Syst 2019; 29:1950026. [PMID: 31711331 DOI: 10.1142/s0129065719500266] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The problem of gaze allocation has previously been studied in the framework of eye-movement control models, which require prior knowledge of visibility maps (VMs). These encode the signal-to-noise ratio, at each point in the visual field, which can be used to define an optimal policy of gaze allocation. However, it is not always possible to estimate the VM, in a given experimental setting, as it depends on many factors, including the visual system of the individual observer. Hence, few eye-movement datasets include the corresponding VM estimates. This can be problematic for the analysis of certain clinical conditions, such as Age-related Macular Degeneration (AMD), which are associated with reduced sensitivity in the affected locations of the visual field. The corresponding VMs are highly idiosyncratic, and cannot be modeled by estimates obtained from healthy observers. We propose an algorithm for maximum likelihood VM estimation, working directly from eye-movement sequences. We apply this algorithm to two eye-tracking datasets, based on visual search tasks, obtained from AMD patients. We show that the inferred VMs are spatially consistent with the measured visual field sensitivities. We also show that simulations with the estimated VMs can account for the asymmetric distribution of saccade vectors, which is typical of AMD patients.
Collapse
|
15
|
Abstract
The vision for precision medicine is to use individual patient characteristics to inform a personalized treatment plan that leads to the best possible health-care for each patient. Mobile technologies have an important role to play in this vision as they offer a means to monitor a patient's health status in real-time and subsequently to deliver interventions if, when, and in the dose that they are needed. Dynamic treatment regimes formalize individualized treatment plans as sequences of decision rules, one per stage of clinical intervention, that map current patient information to a recommended treatment. However, most existing methods for estimating optimal dynamic treatment regimes are designed for a small number of fixed decision points occurring on a coarse time-scale. We propose a new reinforcement learning method for estimating an optimal treatment regime that is applicable to data collected using mobile technologies in an out-patient setting. The proposed method accommodates an indefinite time horizon and minute-by-minute decision making that are common in mobile health applications. We show that the proposed estimators are consistent and asymptotically normal under mild conditions. The proposed methods are applied to estimate an optimal dynamic treatment regime for controlling blood glucose levels in patients with type 1 diabetes.
Collapse
|
16
|
Abstract
The free-energy principle is an attempt to explain the structure of the agent and its brain, starting from the fact that an agent exists (Friston and Stephan, 2007; Friston et al., 2010). More specifically, it can be regarded as a systematic attempt to understand the 'fit' between an embodied agent and its niche, where the quantity of free-energy is a measure for the 'misfit' or disattunement (Bruineberg and Rietveld, 2014) between agent and environment. This paper offers a proof-of-principle simulation of niche construction under the free-energy principle. Agent-centered treatments have so far failed to address situations where environments change alongside agents, often due to the action of agents themselves. The key point of this paper is that the minimum of free-energy is not at a point in which the agent is maximally adapted to the statistics of a static environment, but can better be conceptualized an attracting manifold within the joint agent-environment state-space as a whole, which the system tends toward through mutual interaction. We will provide a general introduction to active inference and the free-energy principle. Using Markov Decision Processes (MDPs), we then describe a canonical generative model and the ensuing update equations that minimize free-energy. We then apply these equations to simulations of foraging in an environment; in which an agent learns the most efficient path to a pre-specified location. In some of those simulations, unbeknownst to the agent, the 'desire paths' emerge as a function of the activity of the agent (i.e. niche construction occurs). We will show how, depending on the relative inertia of the environment and agent, the joint agent-environment system moves to different attracting sets of jointly minimized free-energy.
Collapse
|
17
|
Abstract
BACKGROUND The cost-effectiveness and value of additional information about a health technology or program may change over time because of trends affecting patient cohorts and/or the intervention. Delaying information collection even for parameters that do not change over time may be optimal. METHODS We present a stochastic dynamic programming approach to simultaneously identify the optimal intervention and information collection policies. We use our framework to evaluate birth cohort hepatitis C virus (HCV) screening. We focus on how the presence of a time-varying parameter (HCV prevalence) affects the optimal information collection policy for a parameter assumed constant across birth cohorts: liver fibrosis stage distribution for screen-detected diagnosis at age 50. RESULTS We prove that it may be optimal to delay information collection until a time when the information more immediately affects decision making. For the example of HCV screening, given initial beliefs, the optimal policy (at 2010) was to continue screening and collect information about the distribution of liver fibrosis at screen-detected diagnosis in 12 years, increasing the expected incremental net monetary benefit (INMB) by $169.5 million compared to current guidelines. CONCLUSIONS The option to delay information collection until the information is sufficiently likely to influence decisions can increase efficiency. A dynamic programming framework enables an assessment of the marginal value of information and determines the optimal policy, including when and how much information to collect.
Collapse
|
18
|
Behavior Model Calibration for Epidemic Simulations. PROCEEDINGS OF THE ... INTERNATIONAL JOINT CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS : AAMAS. INTERNATIONAL JOINT CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS 2018; 2018:1640-1648. [PMID: 34305482 PMCID: PMC8300053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Computational epidemiologists frequently employ large-scale agent-based simulations of human populations to study disease outbreaks and assess intervention strategies. The agents used in such simulations rarely capture the real-world decision-making of human beings. An absence of realistic agent behavior can undermine the reliability of insights generated by such simulations and might make them ill-suited for informing public health policies. In this paper, we address this problem by developing a methodology to create and calibrate an agent decision making model for a large multi-agent simulation, using survey data. Our method optimizes a cost vector associated with the various behaviors to match the behavior distributions observed in a detailed survey of human behaviors during influenza outbreaks. Our approach is a data-driven way of incorporating decision making for agents in large-scale epidemic simulations.
Collapse
|
19
|
Population-level intervention and information collection in dynamic healthcare policy. Health Care Manag Sci 2017; 21:604-631. [PMID: 28887763 PMCID: PMC6208882 DOI: 10.1007/s10729-017-9415-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2017] [Accepted: 08/10/2017] [Indexed: 12/09/2022]
Abstract
We develop a general framework for optimal health policy design in a dynamic setting. We consider a hypothetical medical intervention for a cohort of patients where one parameter varies across cohorts with imperfectly observable linear dynamics. We seek to identify the optimal time to change the current health intervention policy and the optimal time to collect decision-relevant information. We formulate this problem as a discrete-time, infinite-horizon Markov decision process and we establish structural properties in terms of first and second-order monotonicity. We demonstrate that it is generally optimal to delay information acquisition until an effect on decisions is sufficiently likely. We apply this framework to the evaluation of hepatitis C virus (HCV) screening in the general population determining which birth cohorts to screen for HCV and when to collect information about HCV prevalence.
Collapse
|
20
|
Multi-Objective Markov Decision Processes for Data-Driven Decision Support. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2016; 17:211. [PMID: 28018133 PMCID: PMC5179144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
We present new methodology based on Multi-Objective Markov Decision Processes for developing sequential decision support systems from data. Our approach uses sequential decision-making data to provide support that is useful to many different decision-makers, each with different, potentially time-varying preference. To accomplish this, we develop an extension of fitted-Q iteration for multiple objectives that computes policies for all scalarization functions, i.e. preference functions, simultaneously from continuous-state, finite-horizon data. We identify and address several conceptual and computational challenges along the way, and we introduce a new solution concept that is appropriate when different actions have similar expected outcomes. Finally, we demonstrate an application of our method using data from the Clinical Antipsychotic Trials of Intervention Effectiveness and show that our approach offers decision-makers increased choice by a larger class of optimal policies.
Collapse
|
21
|
Simulation-based approximate policy iteration for dynamic patient scheduling for radiation therapy. Health Care Manag Sci 2016; 21:317-325. [PMID: 27766509 DOI: 10.1007/s10729-016-9388-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Accepted: 10/10/2016] [Indexed: 10/20/2022]
Abstract
We study radiation therapy scheduling problem where dynamically and stochastically arriving patients of different types are scheduled to future days. Unlike similar models in the literature, we consider cancellation of treatments. We formulate this dynamic multi-appointment patient scheduling problem as a Markov Decision Process (MDP). Since the MDP is intractable due to large state and action spaces, we employ a simulation-based approximate dynamic programming (ADP) approach to approximately solve our model. In particular, we develop Least-square based approximate policy iteration for solving our model. The performance of the ADP approach is compared with that of a myopic heuristic decision rule.
Collapse
|
22
|
Abstract
We consider a reinforcement learning setting where the learner is given a set of possible models containing the true model. While there are algorithms that are able to successfully learn optimal behavior in this setting, they do so without trying to identify the underlying true model. Indeed, we show that there are cases in which the attempt to find the true model is doomed to failure.
Collapse
|
23
|
Dynamic ambulance dispatching: is the closest-idle policy always optimal? Health Care Manag Sci 2016; 20:517-531. [PMID: 27206518 DOI: 10.1007/s10729-016-9368-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2015] [Accepted: 05/03/2016] [Indexed: 10/21/2022]
Abstract
We address the problem of ambulance dispatching, in which we must decide which ambulance to send to an incident in real time. In practice, it is commonly believed that the 'closest idle ambulance' rule is near-optimal and it is used throughout most literature. In this paper, we present alternatives to the classical closest idle ambulance rule. Most ambulance providers as well as researchers focus on minimizing the fraction of arrivals later than a certain threshold time, and we show that significant improvements can be obtained by our alternative policies. The first alternative is based on a Markov decision problem (MDP), that models more than just the number of idle vehicles, while remaining computationally tractable for reasonably-sized ambulance fleets. Second, we propose a heuristic for ambulance dispatching that can handle regions with large numbers of ambulances. Our main focus is on minimizing the fraction of arrivals later than a certain threshold time, but we show that with a small adaptation our MDP can also be used to minimize the average response time. We evaluate our policies by simulating a large emergency medical services region in the Netherlands. For this region, we show that our heuristic reduces the fraction of late arrivals by 18 % compared to the 'closest idle' benchmark policy. A drawback is that this heuristic increases the average response time (for this problem instance with 37 %). Therefore, we do not claim that our heuristic is practically preferable over the closest-idle method. However, our result sheds new light on the popular belief that the closest idle dispatch policy is near-optimal when minimizing the fraction of late arrivals.
Collapse
|
24
|
From biological models to economic optimization. Prev Vet Med 2014; 118:226-37. [PMID: 25496776 DOI: 10.1016/j.prevetmed.2014.11.019] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2014] [Revised: 07/31/2014] [Accepted: 11/15/2014] [Indexed: 11/21/2022]
Abstract
This article addresses the additional challenges being faced when biological models are used as a basis for decision support in livestock herds. The challenges include dealing with uncertain information, observation costs, herd dynamics and methodological issues in relation to the computational methods applied particularly in the dynamic case. The desired key property of information included in models is that it can be used as the basis for unbiased prediction of the future performance of the animals. Often there will be a tradeoff between uncertainty and costs in the sense that the level of uncertainty can be reduced (for instance through additional tests) at some cost. Thus, the decision about which (and how many) tests to perform can be seen as an optimization problem in itself. Another way of expressing the tradeoff is to talk about the value of information which can sometimes be assessed by modeling different approaches and levels of detail in data collection. Various optimization methods of relevance to herd health management are discussed with the main emphasis on decision graphs in the static case and Markov decision processes (dynamic programming) in a dynamic context.
Collapse
|
25
|
Optimization of anemia treatment in hemodialysis patients via reinforcement learning. Artif Intell Med 2014; 62:47-60. [PMID: 25091172 DOI: 10.1016/j.artmed.2014.07.004] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2013] [Revised: 06/23/2014] [Accepted: 07/11/2014] [Indexed: 10/25/2022]
Abstract
OBJECTIVE Anemia is a frequent comorbidity in hemodialysis patients that can be successfully treated by administering erythropoiesis-stimulating agents (ESAs). ESAs dosing is currently based on clinical protocols that often do not account for the high inter- and intra-individual variability in the patient's response. As a result, the hemoglobin level of some patients oscillates around the target range, which is associated with multiple risks and side-effects. This work proposes a methodology based on reinforcement learning (RL) to optimize ESA therapy. METHODS RL is a data-driven approach for solving sequential decision-making problems that are formulated as Markov decision processes (MDPs). Computing optimal drug administration strategies for chronic diseases is a sequential decision-making problem in which the goal is to find the best sequence of drug doses. MDPs are particularly suitable for modeling these problems due to their ability to capture the uncertainty associated with the outcome of the treatment and the stochastic nature of the underlying process. The RL algorithm employed in the proposed methodology is fitted Q iteration, which stands out for its ability to make an efficient use of data. RESULTS The experiments reported here are based on a computational model that describes the effect of ESAs on the hemoglobin level. The performance of the proposed method is evaluated and compared with the well-known Q-learning algorithm and with a standard protocol. Simulation results show that the performance of Q-learning is substantially lower than FQI and the protocol. When comparing FQI and the protocol, FQI achieves an increment of 27.6% in the proportion of patients that are within the targeted range of hemoglobin during the period of treatment. In addition, the quantity of drug needed is reduced by 5.13%, which indicates a more efficient use of ESAs. CONCLUSION Although prospective validation is required, promising results demonstrate the potential of RL to become an alternative to current protocols.
Collapse
|
26
|
Optimal Policies for Reducing Unnecessary Follow-up Mammography Exams in Breast Cancer Diagnosis. DECISION ANALYSIS 2013; 10:200-224. [PMID: 24501588 PMCID: PMC3910299 DOI: 10.1287/deca.2013.0272] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Mammography is the most effective screening tool for early diagnosis of breast cancer. Based on the mammography findings, radiologists need to choose from one of the following three alternatives: 1) take immediate diagnostic actions including prompt biopsy to confirm breast cancer; 2) recommend a follow-up mammogram; 3) recommend routine annual mammography. There are no validated structured guidelines based on a decision-analytical framework to aid radiologists in making such patient management decisions. Surprisingly, only 15-45% of the breast biopsies and less than 1% of short-interval follow-up recommendations are found to be malignant, resulting in unnecessary tests and patient-anxiety. We develop a finite-horizon discrete-time Markov decision process (MDP) model that may help radiologists make patient-management decisions to maximize a patient's total expected quality-adjusted life years. We use clinical data to find the policies recommended by the MDP model and also compare them to decisions made by radiologists at a large mammography practice. We also derive the structural properties of the MDP model, including sufficiency conditions that ensure the existence of a double control-limit type policy.
Collapse
|
27
|
The Effect of Budgetary Restrictions on Breast Cancer Diagnostic Decisions. MANUFACTURING & SERVICE OPERATIONS MANAGEMENT : M & SOM 2012; 14:600-617. [PMID: 24027436 PMCID: PMC3767197 DOI: 10.1287/msom.1110.0371] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
We develop a finite-horizon discrete-time constrained Markov decision process (MDP) to model diagnostic decisions after mammography where we maximize the total expected quality-adjusted life years (QALYs) of a patient under resource constraints. We use clinical data to estimate the parameters of the MDP model and solve it as a mixed-integer program. By repeating optimization for a sequence of budget levels, we calculate incremental cost-effectiveness ratios attributable to consecutive levels of funding and compare actual clinical practice with optimal decisions. We prove that the optimal value function is concave in the allocated budget. Comparing to actual clinical practice, using optimal thresholds for decision making may result in approximately 22% cost savings without sacrificing QALYs. Our analysis indicates short-term follow-ups are the immediate target for elimination when budget becomes a concern. Policy change is more drastic in the older age group with the increasing budget, yet the gains in total expected QALYs related to larger budgets are predominantly seen in younger women along with modest gains for older women.
Collapse
|