1
|
Ovinnikov I, Beuret A, Cavaliere F, Buhmann JM. Fundamentals of Arthroscopic Surgery Training and beyond: a reinforcement learning exploration and benchmark. Int J Comput Assist Radiol Surg 2024:10.1007/s11548-024-03116-z. [PMID: 38684559 DOI: 10.1007/s11548-024-03116-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 03/20/2024] [Indexed: 05/02/2024]
Abstract
PURPOSE This work presents FASTRL, a benchmark set of instrument manipulation tasks adapted to the domain of reinforcement learning and used in simulated surgical training. This benchmark enables and supports the design and training of human-centric reinforcement learning agents which assist and evaluate human trainees in surgical practice. METHODS Simulation tasks from the Fundamentals of Arthroscopic Surgery Training (FAST) program are adapted to the reinforcement learning setting for the purpose of training virtual agents that are capable of providing assistance and scoring to the surgical trainees. A skill performance assessment protocol is presented based on the trained virtual agents. RESULTS The proposed benchmark suite presents an API for training reinforcement learning agents in the context of arthroscopic skill training. The evaluation scheme based on both heuristic and learned reward functions robustly recovers the ground truth ranking on a diverse test set of human trajectories. CONCLUSION The presented benchmark enables the exploration of a novel reinforcement learning-based approach to skill performance assessment and in-procedure assistance for simulated surgical training scenarios. The evaluation protocol based on the learned reward model demonstrates potential for evaluating the performance of surgical trainees in simulation.
Collapse
Affiliation(s)
- Ivan Ovinnikov
- Department of Computer Science, ETH Zürich, Zurich, Switzerland.
| | - Ami Beuret
- Department of Computer Science, ETH Zürich, Zurich, Switzerland
| | | | | |
Collapse
|
2
|
Alali M, Kazeminajafabadi A, Imani M. Deep Reinforcement Learning Sensor Scheduling for Effective Monitoring of Dynamical Systems. Syst Sci Control Eng 2024; 12:2329260. [PMID: 38680720 PMCID: PMC11044865 DOI: 10.1080/21642583.2024.2329260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 03/04/2024] [Indexed: 05/01/2024]
Abstract
Advances in technology have enabled the use of sensors with varied modalities to monitor different parts of systems, each providing diverse levels of information about the underlying system. However, resource limitations and computational power restrict the number of sensors/data that can be processed in real-time in most complex systems. These challenges necessitate the need for selecting/scheduling a subset of sensors to obtain measurements that guarantee the best monitoring objectives. This paper focuses on sensor scheduling for systems modeled by hidden Markov models. Despite the development of several sensor selection and scheduling methods, existing methods tend to be greedy and do not take into account the long-term impact of selected sensors on monitoring objectives. This paper formulates optimal sensor scheduling as a reinforcement learning problem defined over the posterior distribution of system states. Further, the paper derives a deep reinforcement learning policy for offline learning of the sensor scheduling policy, which can then be executed in real-time as new information unfolds. The proposed method applies to any monitoring objective that can be expressed in terms of the posterior distribution of the states (e.g., state estimation, information gain, etc.). The performance of the proposed method in terms of accuracy and robustness is investigated for monitoring the security of networked systems and the health monitoring of gene regulatory networks.
Collapse
Affiliation(s)
- Mohammad Alali
- Northeastern University, 360 Huntington Ave, Boston, MA, 02115, U.S
| | | | - Mahdi Imani
- Northeastern University, 360 Huntington Ave, Boston, MA, 02115, U.S
| |
Collapse
|
3
|
Al-Sakkari EG, Ragab A, Dagdougui H, Boffito DC, Amazouz M. Carbon capture, utilization and sequestration systems design and operation optimization: Assessment and perspectives of artificial intelligence opportunities. Sci Total Environ 2024; 917:170085. [PMID: 38224888 DOI: 10.1016/j.scitotenv.2024.170085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 12/10/2023] [Accepted: 01/09/2024] [Indexed: 01/17/2024]
Abstract
Carbon capture, utilization, and sequestration (CCUS) is a promising solution to decarbonize the energy and industrial sectors to mitigate climate change. An integrated assessment of technological options is required for the effective deployment of CCUS large-scale infrastructure between CO2 production and utilization/sequestration nodes. However, developing cost-effective strategies from engineering and operation perspectives to implement CCUS is challenging. This is due to the diversity of upstream emitting processes located in different geographical areas, available downstream utilization technologies, storage sites capacity/location, and current/future energy/emissions/economic conditions. This paper identifies the need to achieve a robust hybrid assessment tool for CCUS modeling, simulation, and optimization based mainly on artificial intelligence (AI) combined with mechanistic methods. Thus, a critical literature review is conducted to assess CCUS technologies and their related process modeling/simulation/optimization techniques, while evaluating the needs for improvements or new developments to reduce overall CCUS systems design and operation costs. These techniques include first principles- based and data-driven ones, i.e. AI and related machine learning (ML) methods. Besides, the paper gives an overview on the role of life cycle assessment (LCA) to evaluate CCUS systems where the combined LCA-AI approach is assessed. Other advanced methods based on the AI/ML capabilities/algorithms can be developed to optimize the whole CCUS value chain. Interpretable ML combined with explainable AI can accelerate optimum materials selection by giving strong rules which accelerates the design of capture/utilization plants afterwards. Besides, deep reinforcement learning (DRL) coupled with process simulations will accelerate process design/operation optimization through considering simultaneous optimization of equipment sizing and operating conditions. Moreover, generative deep learning (GDL) is a key solution to optimum capture/utilization materials design/discovery. The developed AI methods can be generalizable where the extracted knowledge can be transferred to future works to help cutting the costs of CCUS value chain.
Collapse
Affiliation(s)
- Eslam G Al-Sakkari
- Department of Mathematics and Industrial Engineering, Polytechnique Montréal, 2500 Chemin de Polytechnique, Montréal, Québec H3T 1J4, Canada; CanmetENERGY, 1615 Lionel-Boulet Blvd, P.O. Box 4800, Varennes, Québec J3X 1P7, Canada.
| | - Ahmed Ragab
- Department of Mathematics and Industrial Engineering, Polytechnique Montréal, 2500 Chemin de Polytechnique, Montréal, Québec H3T 1J4, Canada; CanmetENERGY, 1615 Lionel-Boulet Blvd, P.O. Box 4800, Varennes, Québec J3X 1P7, Canada
| | - Hanane Dagdougui
- Department of Mathematics and Industrial Engineering, Polytechnique Montréal, 2500 Chemin de Polytechnique, Montréal, Québec H3T 1J4, Canada
| | - Daria C Boffito
- Department of Chemical Engineering, Polytechnique Montréal, 2500 Chemin de Polytechnique, Montréal, Québec H3T 1J4, Canada; Canada Research Chair in Engineering Process Intensification and Catalysis (EPIC), Canada
| | - Mouloud Amazouz
- CanmetENERGY, 1615 Lionel-Boulet Blvd, P.O. Box 4800, Varennes, Québec J3X 1P7, Canada
| |
Collapse
|
4
|
Chen Y, Zhang F, Liu Z. Adaptive bias-variance trade-off in advantage estimator for actor-critic algorithms. Neural Netw 2024; 169:764-777. [PMID: 37981458 DOI: 10.1016/j.neunet.2023.10.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2021] [Revised: 06/26/2023] [Accepted: 10/16/2023] [Indexed: 11/21/2023]
Abstract
Actor-critic methods are leading in many challenging continuous control tasks. Advantage estimators, the most common critics in the actor-critic framework, combine state values from bootstrapping value functions and sample returns. Different combinations balance the bias introduced by state values and the variance returned by samples to reduce estimation errors. The bias and variance constantly fluctuate throughout training, leading to different optimal combinations. However, existing advantage estimators usually use fixed combinations that fail to account for the trade-off between minimizing bias and variance to find the optimal estimate. Our previous work on adaptive advantage estimation (AAE) analyzed the sources of bias and variance and offered two indicators. This paper further explores the relationship between the indicators and their optimal combination through typical numerical experiments. These analyses develop a general form of adaptive combinations of state values and sample returns to achieve low estimation errors. Empirical results on simulated robotic locomotion tasks show that our proposed estimators achieve similar or superior performance compared to previous generalized advantage estimators (GAE).
Collapse
Affiliation(s)
- Yurou Chen
- The State Key Lab of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China.
| | - Fengyi Zhang
- The State Key Lab of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China.
| | - Zhiyong Liu
- The State Key Lab of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China; The Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China.
| |
Collapse
|
5
|
Hassani SA, Womelsdorf T. Noradrenergic alpha-2a Receptor Stimulation Enhances Prediction Error Signaling in Anterior Cingulate Cortex and Striatum. bioRxiv 2023:2023.10.25.564052. [PMID: 37961384 PMCID: PMC10634832 DOI: 10.1101/2023.10.25.564052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
The noradrenergic system is implicated to support behavioral flexibility by increasing exploration during periods of uncertainty and by enhancing working memory for goal-relevant stimuli. Possible sources mediating these pro-cognitive effects are α2A adrenoceptors (α2AR) in prefrontal cortex or the anterior cingulate cortex facilitating fronto-striatal learning processes. We tested this hypothesis by selectively stimulating α2ARs using Guanfacine during feature-based attentional set shifting in nonhuman primates. We found that α2A stimulation improved learning from errors and facilitates updating the target feature of an attentional set. Neural recordings in the anterior cingulate cortex (ACC), the dorsolateral prefrontal cortex (dlPFC), and the striatum showed that α2A stimulation selectively enhanced the neural representation of negative reward prediction errors in neurons of the ACC and of positive prediction errors in the striatum, but not in dlPFC. This modulation was accompanied by enhanced encoding of the feature and location of the attended target across the fronto-striatal network. Enhanced learning was paralleled by enhanced encoding of outcomes in putative fast-spiking interneurons in the ACC, dlPFC, and striatum but not in broad spiking cells, pointing to an interneuron mediated mechanism of α2AR action. These results illustrate that α2A receptors causally support the noradrenergic enhancement of updating attention sets through an enhancement of prediction error signaling in the ACC and the striatum.
Collapse
Affiliation(s)
- Seyed A. Hassani
- Department of Psychology, Vanderbilt University, Nashville, TN 37240
- Vanderbilt Brain Institute, Nashville, TN 37240
- National Institute of Neurological Disorders and Stroke, NIH, Bethesda, MD 20824
| | - Thilo Womelsdorf
- Department of Psychology, Vanderbilt University, Nashville, TN 37240
- Vanderbilt Brain Institute, Nashville, TN 37240
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN 37240
| |
Collapse
|
6
|
Zhang M, Lin H, Takagi S, Cao Y, Shahabi C, Xiong L. CSGAN: Modality-Aware Trajectory Generation via Clustering-based Sequence GAN. IEEE Int Conf Mob Data Manag 2023; 2023:148-157. [PMID: 37965426 PMCID: PMC10644148 DOI: 10.1109/mdm58254.2023.00032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2023]
Abstract
Human mobility data is useful for various applications in urban planning, transportation, and public health, but collecting and sharing real-world trajectories can be challenging due to privacy and data quality issues. To address these problems, recent research focuses on generating synthetic trajectories, mainly using generative adversarial networks (GANs) trained by real-world trajectories. In this paper, we hypothesize that by explicitly capturing the modality of transportation (e.g., walking, biking, driving), we can generate not only more diverse and representative trajectories for different modalities but also more realistic trajectories that preserve the geographical density, trajectory, and transition level properties by capturing both cross-modality and modality-specific patterns. Towards this end, we propose a Clustering-based Sequence Generative Adversarial Network (CSGAN) that simultaneously clusters the trajectories based on their modalities and learns the essential properties of real-world trajectories to generate realistic and representative synthetic trajectories. To measure the effectiveness of generated trajectories, in addition to typical density and trajectory level statistics, we define several new metrics for a comprehensive evaluation, including modality distribution and transition probabilities both globally and within each modality. Our extensive experiments with real-world datasets show the superiority of our model in various metrics over state-of-the-art models.
Collapse
|
7
|
Zehfroosh A, Tanner HG. PAC Reinforcement Learning Algorithm for General-Sum Markov Games. IEEE Trans Automat Contr 2023; 68:2821-2831. [PMID: 37915545 PMCID: PMC10617487 DOI: 10.1109/tac.2022.3219340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2023]
Abstract
This paper presents a theoretical framework for probably approximately correct (PAC) multi-agent reinforcement learning (MARL) algorithms for Markov games. Using the idea of delayed Q-learning, the paper extends the well-known Nash Q-learning algorithm to build a new PAC MARL algorithm for general-sum Markov games. In addition to guiding the design of a provably PACMARL algorithm, the framework enables checking whether an arbitrary MARL algorithm is PAC. Comparative numerical results demonstrate the algorithm's performance and robustness.
Collapse
Affiliation(s)
- Ashkan Zehfroosh
- Department of Mechanical Engineering, University of Delaware, Newark, DE 19716 USA
| | - Herbert G Tanner
- Department of Mechanical Engineering, University of Delaware, Newark, DE 19716 USA
| |
Collapse
|
8
|
Coelho-Magalhães T, Azevedo Coste C, Resende-Martins H. A Novel Functional Electrical Stimulation-Induced Cycling Controller Using Reinforcement Learning to Optimize Online Muscle Activation Pattern. Sensors (Basel) 2022; 22:9126. [PMID: 36501826 PMCID: PMC9741342 DOI: 10.3390/s22239126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 11/02/2022] [Accepted: 11/08/2022] [Indexed: 06/17/2023]
Abstract
This study introduces a novel controller based on a Reinforcement Learning (RL) algorithm for real-time adaptation of the stimulation pattern during FES-cycling. Core to our approach is the introduction of an RL agent that interacts with the cycling environment and learns through trial and error how to modulate the electrical charge applied to the stimulated muscle groups according to a predefined policy and while tracking a reference cadence. Instead of a static stimulation pattern to be modified by a control law, we hypothesized that a non-stationary baseline set of parameters would better adjust the amount of injected electrical charge to the time-varying characteristics of the musculature. Overground FES-assisted cycling sessions were performed by a subject with spinal cord injury (SCI AIS-A, T8). For tracking a predefined pedaling cadence, two closed-loop control laws were simultaneously used to modulate the pulse intensity of the stimulation channels responsible for evoking the muscle contractions. First, a Proportional-Integral (PI) controller was used to control the current amplitude of the stimulation channels over an initial parameter setting with predefined pulse amplitude, width and fixed frequency parameters. In parallel, an RL algorithm with a decayed-epsilon-greedy strategy was implemented to randomly explore nine different variations of pulse amplitude and width parameters over the same stimulation setting, aiming to adjust the injected electrical charge according to a predefined policy. The performance of this global control strategy was evaluated in two different RL settings and explored in two different cycling scenarios. The participant was able to pedal overground for distances over 3.5 km, and the results evidenced the RL agent learned to modify the stimulation pattern according to the predefined policy and was simultaneously able to track a predefined pedaling cadence. Despite the simplicity of our approach and the existence of more sophisticated RL algorithms, our method can be used to reduce the time needed to define stimulation patterns. Our results suggest interesting research possibilities to be explored in the future to improve cycling performance since more efficient stimulation cost dynamics can be explored and implemented for the agent to learn.
Collapse
Affiliation(s)
- Tiago Coelho-Magalhães
- Graduate Program in Electrical Engineering, Universidade Federal de Minas Gerais, Av, Antônio Carlos 6627, Belo Horizonte 31270-901, MG, Brazil
| | - Christine Azevedo Coste
- National Institute for Research in Computer Science and Automation (Inria), Camin Team, 34090 Montpellier, France
| | - Henrique Resende-Martins
- Graduate Program in Electrical Engineering, Universidade Federal de Minas Gerais, Av, Antônio Carlos 6627, Belo Horizonte 31270-901, MG, Brazil
| |
Collapse
|
9
|
Barnoy Y, Erin O, Raval S, Pryor W, Mair LO, Weinberg IN, Diaz-Mercado Y, Krieger A, Hager GD. Control of Magnetic Surgical Robots With Model-Based Simulators and Reinforcement Learning. IEEE Trans Med Robot Bionics 2022; 4:945-956. [PMID: 37600471 PMCID: PMC10438915 DOI: 10.1109/tmrb.2022.3214426] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/22/2023]
Abstract
Magnetically manipulated medical robots are a promising alternative to current robotic platforms, allowing for miniaturization and tetherless actuation. Controlling such systems autonomously may enable safe, accurate operation. However, classical control methods require rigorous models of magnetic fields, robot dynamics, and robot environments, which can be difficult to generate. Model-free reinforcement learning (RL) offers an alternative that can bypass these requirements. We apply RL to a robotic magnetic needle manipulation system. Reinforcement learning algorithms often require long runtimes, making them impractical for many surgical robotics applications, most of which require careful, constant monitoring. Our approach first constructs a model-based simulation (MBS) on guided real-world exploration, learning the dynamics of the environment. After intensive MBS environment training, we transfer the learned behavior from the MBS environment to the real-world. Our MBS method applies RL roughly 200 times faster than doing so in the real world, and achieves a 6 mm root-mean-square (RMS) error for a square reference trajectory. In comparison, pure simulation-based approaches fail to transfer, producing a 31 mm RMS error. These results demonstrate that MBS environments are a good solution for domains where running model-free RL is impractical, especially if an accurate simulation is not available.
Collapse
Affiliation(s)
- Yotam Barnoy
- Department of Computer Science, The Johns Hopkins University, Baltimore, MD 21287 USA
| | - Onder Erin
- Department of Mechanical Engineering, The Johns Hopkins University, Baltimore, MD 21287 USA
| | - Suraj Raval
- Department of Mechanical Engineering, University of Maryland, College Park, MD 20742 USA
| | - Will Pryor
- Department of Mechanical Engineering, The Johns Hopkins University, Baltimore, MD 21287 USA
| | - Lamar O. Mair
- Weinberg Medical Physics, Inc., North Bethesda, MD 20852 USA
| | | | - Yancy Diaz-Mercado
- Department of Mechanical Engineering, University of Maryland, College Park, MD 20742 USA
| | - Axel Krieger
- Department of Mechanical Engineering, The Johns Hopkins University, Baltimore, MD 21287 USA
| | - Gregory D. Hager
- Department of Computer Science, The Johns Hopkins University, Baltimore, MD 21287 USA
| |
Collapse
|
10
|
Thanawala R, Jesneck J, Shelton J, Rhee R, Seymour NE. Overcoming Systems Factors in Case Logging with Artificial Intelligence Tools. J Surg Educ 2022; 79:1024-1030. [PMID: 35193831 DOI: 10.1016/j.jsurg.2022.01.013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2021] [Revised: 11/03/2021] [Accepted: 01/30/2022] [Indexed: 06/14/2023]
Abstract
INTRODUCTION Case logs are foundational data in surgical education, yet cases are consistently under-reported. Logging behavior is driven by multiple human and systems factors, including time constraints, ease of case data retrieval, access to data-entry tools, and procedural code decision tools. METHODS We examined case logging trends at three mid-sized, general surgery training programs from September 2016-October 2020, January 2019-October 2020 and May 2019-October 2020, respectively. Across the programs we compared the number of cases logged per week when residents logged directly to ACGME versus via a resident education platform with machine learning-based case logging assistance tools. We examined case logging patterns across 4 consecutive phases: baseline default ACGME logging prior to platform access (P0 "Manual"), full platform logging assistance (P1 "Assisted"), partial platform assistance requiring manual data entry without data integrations (P2 "Notebook"), and resumed fully integrated platform with logging assistance (P3 "Resumed"). RESULTS 31,385 cases were logged utilizing the platform since 2016 by 171 residents across the 3 programs.Intelligent case logging assistance significantly increased case logging rates, from 1.44 ± 1.48 cases by manual entry in P0 to 4.77 ± 2.45 cases per resident per week via the platform in P1 (p-value < 0.00001). Despite the burden of manual data entry when the platform's data connectivity was paused, the tool helped to increase overall case logging into ACGME to 2.85 ± 2.37 cases per week (p-value = 0.0002). Upon resuming the data connectivity, case logging levels rose to 4.54 ± 3.33 cases per week via the platform, equivalent to P1 levels (insignificant difference, p-value = 0.57). CONCLUSIONS Mapping the influence of systems and human factors in high-quality case logs allows us to target interventions to continually improve the training of surgical residents. System level factors such as access to alternate automation-drive tools and operative schedule integrated platforms to assist in ACGME case log has a significant impact on the number of cases captured in logs.
Collapse
Affiliation(s)
- Ruchi Thanawala
- Department of Surgery, Division of Cardiothoracic Surgery, Oregon Health and Science University, Section Thoracic Surgery, Portland, Oregon.
| | | | - Julia Shelton
- Department of Surgery, Division of Pediatric Surgery, University of Iowa, Iowa City, Iowa
| | - Rebecca Rhee
- Department of Surgery, Division of Colorectal Surgery, Maimonides Medical Center, Brooklyn, New York
| | - Neal E Seymour
- Department of Surgery, University of Massachusetts Medical School-Baystate Medical Center, Springfield, Massachusetts
| |
Collapse
|
11
|
Zong K, Luo C. Reinforcement learning based framework for COVID-19 resource allocation. Comput Ind Eng 2022; 167:107960. [PMID: 35125625 PMCID: PMC8800507 DOI: 10.1016/j.cie.2022.107960] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 12/10/2021] [Accepted: 01/13/2022] [Indexed: 06/14/2023]
Abstract
In this paper, a reinforcement learning based framework is developed for COVID-19 resource allocation. We first construct an agent-based epidemic environment to model the transmission dynamics in multiple states. Then, a multi-agent reinforcement-learning algorithm is proposed based on the time-varying properties of the environment, and the performance of the algorithm is compared with other algorithms. According to the age distribution of populations and their economic conditions, the optimal lockdown resource allocation strategies of Arizona, California, Nevada, and Utah in the United States are determined using the proposed reinforcement-learning algorithm. Experimental results show that the framework can adopt more flexible resource allocation strategies and help decision makers to determine the optimal deployment of limited resources in infection prevention.
Collapse
Affiliation(s)
- Kai Zong
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Cuicui Luo
- International College, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
12
|
Cheng X, Wang L, Lv Q, Wu H, Huang X, Yuan J, Sun X, Zhao X, Yan C, Yi Z. Reduced learning bias towards the reward context in medication-naive first-episode schizophrenia patients. BMC Psychiatry 2022; 22:123. [PMID: 35172748 PMCID: PMC8851841 DOI: 10.1186/s12888-021-03682-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 12/28/2021] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Reinforcement learning has been proposed to contribute to the development of amotivation in individuals with schizophrenia (SZ). Accumulating evidence suggests dysfunctional learning in individuals with SZ in Go/NoGo learning and expected value representation. However, previous findings might have been confounded by the effects of antipsychotic exposure. Moreover, reinforcement learning also rely on the learning context. Few studies have examined the learning performance in reward and loss-avoidance context separately in medication-naïve individuals with first-episode SZ. This study aimed to explore the behaviour profile of reinforcement learning performance in medication-naïve individuals with first-episode SZ, including the contextual performance, the Go/NoGo learning and the expected value representation performance. METHODS Twenty-nine medication-naïve individuals with first-episode SZ and 40 healthy controls (HCs) who have no significant difference in age and gender, completed the Gain and Loss Avoidance Task, a reinforcement learning task involving stimulus pairs presented in both the reward and loss-avoidance context. We assessed the group difference in accuracy in the reward and loss-avoidance context, the Go/NoGo learning and the expected value representation. The correlations between learning performance and the negative symptom severity were examined. RESULTS Individuals with SZ showed significantly lower accuracy when learning under the reward than the loss-avoidance context as compared to HCs. The accuracies under the reward context (90%win- 10%win) in the Acquisition phase was significantly and negatively correlated with the Scale for the Assessment of Negative Symptoms (SANS) avolition scores in individuals with SZ. On the other hand, individuals with SZ showed spared ability of Go/NoGo learning and expected value representation. CONCLUSIONS Despite our small sample size and relatively modest findings, our results suggest possible reduced learning bias towards reward context among medication-naïve individuals with first-episode SZ. The reward learning performance was correlated with amotivation symptoms. This finding may facilitate our understanding of the underlying mechanism of negative symptoms. Reinforcement learning performance under the reward context may be important to better predict and prevent the development of schizophrenia patients' negative symptom, especially amotivation.
Collapse
Affiliation(s)
- Xiaoyan Cheng
- grid.16821.3c0000 0004 0368 8293Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, 600 South Wanping Road, Shanghai, China ,grid.24516.340000000123704535Clinical Research Center for Mental Disorders, Shanghai Pudong New Area Mental Health Center, School of Medicine, Tongji University, Shanghai, China
| | - Lingling Wang
- grid.9227.e0000000119573309Neuropsychology and Applied Cognitive Neuroscience Laboratory, CAS Key Laboratory of Mental Health, Institute of Psychology, Chinese Academy of Sciences, Beijing, China ,grid.410726.60000 0004 1797 8419Department of Psychology, University of Chinese Academy of Sciences, Beijing, China ,grid.22069.3f0000 0004 0369 6365Key Laboratory of Brain Functional Genomics (MOE&STCSM), Affiliated Mental Health Center (ECNU), School of Psychology and Cognitive Science, East China Normal University, 3663 North Zhongshan Road, Shanghai, 200062 China
| | - Qinyu Lv
- grid.16821.3c0000 0004 0368 8293Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, 600 South Wanping Road, Shanghai, China
| | - Haisu Wu
- grid.16821.3c0000 0004 0368 8293Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, 600 South Wanping Road, Shanghai, China
| | - Xinxin Huang
- grid.16821.3c0000 0004 0368 8293Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, 600 South Wanping Road, Shanghai, China
| | - Jie Yuan
- grid.24516.340000000123704535Clinical Research Center for Mental Disorders, Shanghai Pudong New Area Mental Health Center, School of Medicine, Tongji University, Shanghai, China
| | - Xirong Sun
- grid.24516.340000000123704535Clinical Research Center for Mental Disorders, Shanghai Pudong New Area Mental Health Center, School of Medicine, Tongji University, Shanghai, China
| | - Xudong Zhao
- grid.24516.340000000123704535Clinical Research Center for Mental Disorders, Shanghai Pudong New Area Mental Health Center, School of Medicine, Tongji University, Shanghai, China
| | - Chao Yan
- Key Laboratory of Brain Functional Genomics (MOE&STCSM), Affiliated Mental Health Center (ECNU), School of Psychology and Cognitive Science, East China Normal University, 3663 North Zhongshan Road, Shanghai, 200062, China.
| | - Zhenghui Yi
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, 600 South Wanping Road, Shanghai, China.
| |
Collapse
|
13
|
Martinez-Saito M, Andraszewicz S, Klucharev V, Rieskamp J. Mine or Ours? Neural Basis of the Exploitation of Common-Pool Resources. Soc Cogn Affect Neurosci 2022; 17:837-849. [PMID: 35104883 PMCID: PMC9433840 DOI: 10.1093/scan/nsac008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2021] [Revised: 12/01/2021] [Accepted: 01/27/2022] [Indexed: 12/01/2022] Open
Abstract
Why do people often exhaust unregulated common (shared) natural resources but manage to preserve similar private resources? To answer this question, in this study we combine a neurobiological, economic and cognitive modeling approach. Using functional magnetic resonance imaging on 50 participants, we show that a sharp decrease of common and private resources is associated with deactivation of the ventral striatum, a brain region involved in the valuation of outcomes. Across individuals, when facing a common resource, ventral striatal activity is anticorrelated with resource preservation (less harvesting), whereas with private resources the opposite pattern is observed. This indicates that neural value signals distinctly modulate behavior in response to the depletion of common vs private resources. Computational modeling suggested that overharvesting of common resources was facilitated by the modulatory effect of social comparison on value signals. These results provide an explanation of people’s tendency to over-exploit unregulated common natural resources.
Collapse
Affiliation(s)
- Mario Martinez-Saito
- International Laboratory of Social Neurobiology, Institute of Cognitive Neuroscience, HSE University, Russian Federation, Moscow 101000, Russia
| | - Sandra Andraszewicz
- Department of Humanities, Social and Political Sciences, ETH Zurich, Zurich 8006, Swiss Confederation
- Department of Psychology, University of Basel, Basel 4055, Swiss Confederation
| | - Vasily Klucharev
- International Laboratory of Social Neurobiology, Institute of Cognitive Neuroscience, HSE University, Russian Federation, Moscow 101000, Russia
| | - Jörg Rieskamp
- Correspondence should be addressed to Jörg Rieskamp, Department of Psychology, University of Basel, Basel 4055, Swiss Confederation. E-mail:
| |
Collapse
|
14
|
Abdeldayem OM, Dabbish AM, Habashy MM, Mostafa MK, Elhefnawy M, Amin L, Al-Sakkari EG, Ragab A, Rene ER. Viral outbreaks detection and surveillance using wastewater-based epidemiology, viral air sampling, and machine learning techniques: A comprehensive review and outlook. Sci Total Environ 2022; 803:149834. [PMID: 34525746 PMCID: PMC8379898 DOI: 10.1016/j.scitotenv.2021.149834] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 08/05/2021] [Accepted: 08/18/2021] [Indexed: 05/06/2023]
Abstract
A viral outbreak is a global challenge that affects public health and safety. The coronavirus disease 2019 (COVID-19) has been spreading globally, affecting millions of people worldwide, and led to significant loss of lives and deterioration of the global economy. The current adverse effects caused by the COVID-19 pandemic demands finding new detection methods for future viral outbreaks. The environment's transmission pathways include and are not limited to air, surface water, and wastewater environments. The wastewater surveillance, known as wastewater-based epidemiology (WBE), can potentially monitor viral outbreaks and provide a complementary clinical testing method. Another investigated outbreak surveillance technique that has not been yet implemented in a sufficient number of studies is the surveillance of Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) in the air. Artificial intelligence (AI) and its related machine learning (ML) and deep learning (DL) technologies are currently emerging techniques for detecting viral outbreaks using global data. To date, there are no reports that illustrate the potential of using WBE with AI to detect viral outbreaks. This study investigates the transmission pathways of SARS-CoV-2 in the environment and provides current updates on the surveillance of viral outbreaks using WBE, viral air sampling, and AI. It also proposes a novel framework based on an ensemble of ML and DL algorithms to provide a beneficial supportive tool for decision-makers. The framework exploits available data from reliable sources to discover meaningful insights and knowledge that allows researchers and practitioners to build efficient methods and protocols that accurately monitor and detect viral outbreaks. The proposed framework could provide early detection of viruses, forecast risk maps and vulnerable areas, and estimate the number of infected citizens.
Collapse
Affiliation(s)
- Omar M Abdeldayem
- Department of Water Supply, Sanitation and Environmental Engineering, IHE Delft Institute for Water Education, Westvest 7, 2611AX Delft, the Netherlands.
| | - Areeg M Dabbish
- Biotechnology Graduate Program, Biology Department, School of Science and Engineering, The American University in Cairo, New Cairo 11835, Egypt
| | - Mahmoud M Habashy
- Department of Water Supply, Sanitation and Environmental Engineering, IHE Delft Institute for Water Education, Westvest 7, 2611AX Delft, the Netherlands
| | - Mohamed K Mostafa
- Faculty of Engineering and Technology, Badr University in Cairo (BUC), Cairo 11829, Egypt
| | - Mohamed Elhefnawy
- CanmetENERGY, 1615 Lionel-Boulet Blvd, P.O. Box 4800, Varennes, Québec J3X 1P7, Canada; Department of Mathematics and Industrial Engineering, Polytechnique Montréal 2500 Chemin de Polytechnique, Montréal, Québec H3T 1J4, Canada
| | - Lobna Amin
- Department of Water Supply, Sanitation and Environmental Engineering, IHE Delft Institute for Water Education, Westvest 7, 2611AX Delft, the Netherlands; Department of Built Environment, Aalto University, PO Box 15200, FI-00076, Aalto, Finland
| | - Eslam G Al-Sakkari
- Chemical Engineering Department, Cairo University, Cairo University Road, 12613 Giza, Egypt
| | - Ahmed Ragab
- CanmetENERGY, 1615 Lionel-Boulet Blvd, P.O. Box 4800, Varennes, Québec J3X 1P7, Canada; Department of Mathematics and Industrial Engineering, Polytechnique Montréal 2500 Chemin de Polytechnique, Montréal, Québec H3T 1J4, Canada; Faculty of Electronic Engineering, Menoufia University, 32952, Menouf, Egypt
| | - Eldon R Rene
- Department of Water Supply, Sanitation and Environmental Engineering, IHE Delft Institute for Water Education, Westvest 7, 2611AX Delft, the Netherlands
| |
Collapse
|
15
|
Abstract
Artificial intelligence (AI) tools find increasing application in drug discovery supporting every stage of the Design-Make-Test-Analyse (DMTA) cycle. The main focus of this chapter is the application in molecular generation with the aid of deep neural networks (DNN). We present a historical overview of the main advances in the field. We analyze the concepts of distribution and goal-directed learning and then highlight some of the recent applications of generative models in drug design with a focus into research work from the biopharmaceutical industry. We present in some more detail REINVENT which is an open-source software developed within our group in AstraZeneca and the main platform for AI molecular design support for a number of medicinal chemistry projects in the company and we also demonstrate some of our work in library design. Finally, we present some of the main challenges in the application of AI in Drug Discovery and different approaches to respond to these challenges which define areas for current and future work.
Collapse
|
16
|
Gil Ó, Garrell A, Sanfeliu A. Social Robot Navigation Tasks: Combining Machine Learning Techniques and Social Force Model. Sensors (Basel) 2021; 21:7087. [PMID: 34770395 DOI: 10.3390/s21217087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 10/12/2021] [Accepted: 10/15/2021] [Indexed: 11/26/2022]
Abstract
Social robot navigation in public spaces, buildings or private houses is a difficult problem that is not well solved due to environmental constraints (buildings, static objects etc.), pedestrians and other mobile vehicles. Moreover, robots have to move in a human-aware manner—that is, robots have to navigate in such a way that people feel safe and comfortable. In this work, we present two navigation tasks, social robot navigation and robot accompaniment, which combine machine learning techniques with the Social Force Model (SFM) allowing human-aware social navigation. The robots in both approaches use data from different sensors to capture the environment knowledge as well as information from pedestrian motion. The two navigation tasks make use of the SFM, which is a general framework in which human motion behaviors can be expressed through a set of functions depending on the pedestrians’ relative and absolute positions and velocities. Additionally, in both social navigation tasks, the robot’s motion behavior is learned using machine learning techniques: in the first case using supervised deep learning techniques and, in the second case, using Reinforcement Learning (RL). The machine learning techniques are combined with the SFM to create navigation models that behave in a social manner when the robot is navigating in an environment with pedestrians or accompanying a person. The validation of the systems was performed with a large set of simulations and real-life experiments with a new humanoid robot denominated IVO and with an aerial robot. The experiments show that the combination of SFM and machine learning can solve human-aware robot navigation in complex dynamic environments.
Collapse
|
17
|
Eckstein MK, Wilbrecht L, Collins AGE. What do Reinforcement Learning Models Measure? Interpreting Model Parameters in Cognition and Neuroscience. Curr Opin Behav Sci 2021; 41:128-137. [PMID: 34984213 PMCID: PMC8722372 DOI: 10.1016/j.cobeha.2021.06.004] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Reinforcement learning (RL) is a concept that has been invaluable to fields including machine learning, neuroscience, and cognitive science. However, what RL entails differs between fields, leading to difficulties when interpreting and translating findings. After laying out these differences, this paper focuses on cognitive (neuro)science to discuss how we as a field might over-interpret RL modeling results. We too often assume-implicitly-that modeling results generalize between tasks, models, and participant populations, despite negative empirical evidence for this assumption. We also often assume that parameters measure specific, unique (neuro)cognitive processes, a concept we call interpretability, when evidence suggests that they capture different functions across studies and tasks. We conclude that future computational research needs to pay increased attention to implicit assumptions when using RL models, and suggest that a more systematic understanding of contextual factors will help address issues and improve the ability of RL to explain brain and behavior.
Collapse
Affiliation(s)
- Maria K Eckstein
- Department of Psychology, UC Berkeley, 2121 Berkeley Way West, Berkeley, 94720, CA, USA
| | - Linda Wilbrecht
- Department of Psychology, UC Berkeley, 2121 Berkeley Way West, Berkeley, 94720, CA, USA
- Helen Wills Neuroscience Institute, UC Berkeley, 175 Li Ka Shing Center, Berkeley, 94720, CA, USA
| | - Anne G E Collins
- Department of Psychology, UC Berkeley, 2121 Berkeley Way West, Berkeley, 94720, CA, USA
- Helen Wills Neuroscience Institute, UC Berkeley, 175 Li Ka Shing Center, Berkeley, 94720, CA, USA
| |
Collapse
|
18
|
Pereira T, Abbasi M, Ribeiro B, Arrais JP. Diversity oriented Deep Reinforcement Learning for targeted molecule generation. J Cheminform 2021; 13:21. [PMID: 33750461 PMCID: PMC7944916 DOI: 10.1186/s13321-021-00498-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 02/22/2021] [Indexed: 11/10/2022] Open
Abstract
In this work, we explore the potential of deep learning to streamline the process of identifying new potential drugs through the computational generation of molecules with interesting biological properties. Two deep neural networks compose our targeted generation framework: the Generator, which is trained to learn the building rules of valid molecules employing SMILES strings notation, and the Predictor which evaluates the newly generated compounds by predicting their affinity for the desired target. Then, the Generator is optimized through Reinforcement Learning to produce molecules with bespoken properties. The innovation of this approach is the exploratory strategy applied during the reinforcement training process that seeks to add novelty to the generated compounds. This training strategy employs two Generators interchangeably to sample new SMILES: the initially trained model that will remain fixed and a copy of the previous one that will be updated during the training to uncover the most promising molecules. The evolution of the reward assigned by the Predictor determines how often each one is employed to select the next token of the molecule. This strategy establishes a compromise between the need to acquire more information about the chemical space and the need to sample new molecules, with the experience gained so far. To demonstrate the effectiveness of the method, the Generator is trained to design molecules with an optimized coefficient of partition and also high inhibitory power against the Adenosine [Formula: see text] and [Formula: see text] opioid receptors. The results reveal that the model can effectively adjust the newly generated molecules towards the wanted direction. More importantly, it was possible to find promising sets of unique and diverse molecules, which was the main purpose of the newly implemented strategy.
Collapse
Affiliation(s)
- Tiago Pereira
- Department of Informatics Engineering, Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, Pinhal de Marrocos, Coimbra, Portugal
| | - Maryam Abbasi
- Department of Informatics Engineering, Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, Pinhal de Marrocos, Coimbra, Portugal
| | - Bernardete Ribeiro
- Department of Informatics Engineering, Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, Pinhal de Marrocos, Coimbra, Portugal
| | - Joel P. Arrais
- Department of Informatics Engineering, Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, Pinhal de Marrocos, Coimbra, Portugal
| |
Collapse
|
19
|
Khadilkar H, Ganu T, Seetharam DP. Optimising Lockdown Policies for Epidemic Control using Reinforcement Learning: An AI-Driven Control Approach Compatible with Existing Disease and Network Models. Trans Indian Natl Acad Eng 2020; 5:129-132. [PMID: 38624387 PMCID: PMC7311597 DOI: 10.1007/s41403-020-00129-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 06/04/2020] [Accepted: 06/12/2020] [Indexed: 01/15/2023]
Abstract
There has been intense debate about lockdown policies in the context of Covid-19 for limiting damage both to health and to the economy. We present an AI-driven approach for generating optimal lockdown policies that control the spread of the disease while balancing both health and economic costs. Furthermore, the proposed reinforcement learning approach automatically learns those policies, as a function of disease and population parameters. The approach accounts for imperfect lockdowns, can be used to explore a range of policies using tunable parameters, and can be easily extended to fine-grained lockdown strictness. The control approach can be used with any compatible disease and network simulation models.
Collapse
|
20
|
Xu H, Liu X, Yu W, Griffith D, Golmie N. Reinforcement Learning-Based Control and Networking Co-design for Industrial Internet of Things. IEEE J Sel Areas Commun 2020; 38:10.1109/jsac.2020.2980909. [PMID: 37555009 PMCID: PMC10408385 DOI: 10.1109/jsac.2020.2980909] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/10/2023]
Abstract
Industrial Internet-of-Things (IIoT), also known as Industry 4.0, is the integration of Internet of Things (IoT) technology into the industrial manufacturing system so that the connectivity, efficiency, and intelligence of factories and plants can be improved. From a cyber physical system (CPS) perspective, multiple systems (e.g., control, networking and computing systems) are synthesized into IIoT systems interactively to achieve the operator's design goals. The interactions among different systems is a non-negligible factor that affects the IIoT design and requirements, such as automation, especially under dynamic industrial operations. In this paper, we leverage reinforcement learning techniques to automatically configure the control and networking systems under a dynamic industrial environment. We design three new policies based on the characteristics of industrial systems so that the reinforcement learning can converge rapidly. We implement and integrate the reinforcement learning-based co-design approach on a realistic wireless cyber-physical simulator to conduct extensive experiments. Our experimental results demonstrate that our approach can effectively and quickly reconfigure the control and networking systems automatically in a dynamic industrial environment.
Collapse
Affiliation(s)
| | | | | | - David Griffith
- National Institute of Standards and Technology (NIST), USA
| | - Nada Golmie
- National Institute of Standards and Technology (NIST), USA
| |
Collapse
|
21
|
Liao P, Greenewald K, Klasnja P, Murphy S. Personalized HeartSteps: A Reinforcement Learning Algorithm for Optimizing Physical Activity. Proc ACM Interact Mob Wearable Ubiquitous Technol 2020; 4:18. [PMID: 34527853 PMCID: PMC8439432 DOI: 10.1145/3381007] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
With the recent proliferation of mobile health technologies, health scientists are increasingly interested in developing just-in-time adaptive interventions (JITAIs), typically delivered via notifications on mobile devices and designed to help users prevent negative health outcomes and to promote the adoption and maintenance of healthy behaviors. A JITAI involves a sequence of decision rules (i.e., treatment policies) that take the user's current context as input and specify whether and what type of intervention should be provided at the moment. In this work, we describe a reinforcement learning (RL) algorithm that continuously learns and improves the treatment policy embedded in the JITAI as data is being collected from the user. This work is motivated by our collaboration on designing an RL algorithm for HeartSteps V2 based on data collected HeartSteps V1. HeartSteps is a physical activity mobile health application. The RL algorithm developed in this work is being used in HeartSteps V2 to decide, five times per day, whether to deliver a context-tailored activity suggestion.
Collapse
|
22
|
Alexiadis A. Deep multiphysics: Coupling discrete multiphysics with machine learning to attain self-learning in-silico models replicating human physiology. Artif Intell Med 2019; 98:27-34. [PMID: 31521250 DOI: 10.1016/j.artmed.2019.06.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Revised: 05/30/2019] [Accepted: 06/24/2019] [Indexed: 02/03/2023]
Abstract
OBJECTIVES The objective of this study is to devise a modelling strategy for attaining in-silico models replicating human physiology and, in particular, the activity of the autonomic nervous system. METHOD Discrete Multiphysics (a multiphysics modelling technique) and Reinforcement Learning (a Machine Learning algorithm) are combined to achieve an in-silico model with the ability of self-learning and replicating feedback loops occurring in human physiology. Computational particles, used in Discrete Multiphysics to model biological systems, are associated to (computational) neurons: Reinforcement Learning trains these neurons to behave like they would in real biological systems. RESULTS As benchmark/validation, we use the case of peristalsis in the oesophagus. Results show that the in-silico model effectively learns by itself how to propel the bolus in the oesophagus. CONCLUSIONS The combination of first principles modelling (e.g. multiphysics) and machine learning (e.g. Reinforcement Learning) represents a new powerful tool for in-silico modelling of human physiology. Biological feedback loops occurring, for instance, in peristaltic or metachronal motion, which until now could not be accounted for in in-silico models, can be tackled by the proposed technique.
Collapse
Affiliation(s)
- Alessio Alexiadis
- School of Chemical Engineering, University of Birmingham, Edgbaston, Birmingham, B15 2TT, United Kingdom.
| |
Collapse
|
23
|
Abstract
In this paper, we introduce a new type of tree-based method, reinforcement learning trees (RLT), which exhibits significantly improved performance over traditional methods such as random forests (Breiman, 2001) under high-dimensional settings. The innovations are three-fold. First, the new method implements reinforcement learning at each selection of a splitting variable during the tree construction processes. By splitting on the variable that brings the greatest future improvement in later splits, rather than choosing the one with largest marginal effect from the immediate split, the constructed tree utilizes the available samples in a more efficient way. Moreover, such an approach enables linear combination cuts at little extra computational cost. Second, we propose a variable muting procedure that progressively eliminates noise variables during the construction of each individual tree. The muting procedure also takes advantage of reinforcement learning and prevents noise variables from being considered in the search for splitting rules, so that towards terminal nodes, where the sample size is small, the splitting rules are still constructed from only strong variables. Last, we investigate asymptotic properties of the proposed method under basic assumptions and discuss rationale in general settings.
Collapse
Affiliation(s)
- Ruoqing Zhu
- Department of Biostatistics, CB#7420, University of North Carolina, Chapel Hill, NC 27599-7420
| | - Donglin Zeng
- Department of Biostatistics, CB#7420, University of North Carolina, Chapel Hill, NC 27599-7420
| | - Michael R Kosorok
- Department of Biostatistics, CB#7420, University of North Carolina, Chapel Hill, NC 27599-7420
| |
Collapse
|
24
|
Balasubramani PP, Chakravarthy VS, Ravindran B, Moustafa AA. An extended reinforcement learning model of basal ganglia to understand the contributions of serotonin and dopamine in risk-based decision making, reward prediction, and punishment learning. Front Comput Neurosci 2014; 8:47. [PMID: 24795614 PMCID: PMC3997037 DOI: 10.3389/fncom.2014.00047] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2013] [Accepted: 03/30/2014] [Indexed: 11/29/2022] Open
Abstract
Although empirical and neural studies show that serotonin (5HT) plays many functional roles in the brain, prior computational models mostly focus on its role in behavioral inhibition. In this study, we present a model of risk based decision making in a modified Reinforcement Learning (RL)-framework. The model depicts the roles of dopamine (DA) and serotonin (5HT) in Basal Ganglia (BG). In this model, the DA signal is represented by the temporal difference error (δ), while the 5HT signal is represented by a parameter (α) that controls risk prediction error. This formulation that accommodates both 5HT and DA reconciles some of the diverse roles of 5HT particularly in connection with the BG system. We apply the model to different experimental paradigms used to study the role of 5HT: (1) Risk-sensitive decision making, where 5HT controls risk assessment, (2) Temporal reward prediction, where 5HT controls time-scale of reward prediction, and (3) Reward/Punishment sensitivity, in which the punishment prediction error depends on 5HT levels. Thus the proposed integrated RL model reconciles several existing theories of 5HT and DA in the BG.
Collapse
Affiliation(s)
| | | | - Balaraman Ravindran
- Department of Computer Science and Engineering, Indian Institute of Technology - Madras Chennai, India
| | - Ahmed A Moustafa
- Foundational Processes of Behaviour Research Concentration, Marcs Institute for Brain and Behaviour & School of Social Sciences and Psychology, University of Western Sydney Sydney, NSW, Australia
| |
Collapse
|
25
|
Abstract
In this paper, we consider the batch mode reinforcement learning setting, where the central problem is to learn from a sample of trajectories a policy that satisfies or optimizes a performance criterion. We focus on the continuous state space case for which usual resolution schemes rely on function approximators either to represent the underlying control problem or to represent its value function. As an alternative to the use of function approximators, we rely on the synthesis of "artificial trajectories" from the given sample of trajectories, and show that this idea opens new avenues for designing and analyzing algorithms for batch mode reinforcement learning.
Collapse
|