1
|
Friston KJ, Da Costa L, Tschantz A, Kiefer A, Salvatori T, Neacsu V, Koudahl M, Heins C, Sajid N, Markovic D, Parr T, Verbelen T, Buckley CL. Supervised structure learning. Biol Psychol 2024; 193:108891. [PMID: 39433209 DOI: 10.1016/j.biopsycho.2024.108891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Revised: 10/01/2024] [Accepted: 10/11/2024] [Indexed: 10/23/2024]
Abstract
This paper concerns structure learning or discovery of discrete generative models. It focuses on Bayesian model selection and the assimilation of training data or content, with a special emphasis on the order in which data are ingested. A key move-in the ensuing schemes-is to place priors on the selection of models, based upon expected free energy. In this setting, expected free energy reduces to a constrained mutual information, where the constraints inherit from priors over outcomes (i.e., preferred outcomes). The resulting scheme is first used to perform image classification on the MNIST dataset to illustrate the basic idea, and then tested on a more challenging problem of discovering models with dynamics, using a simple sprite-based visual disentanglement paradigm and the Tower of Hanoi (cf., blocks world) problem. In these examples, generative models are constructed autodidactically to recover (i.e., disentangle) the factorial structure of latent states-and their characteristic paths or dynamics.
Collapse
Affiliation(s)
- Karl J Friston
- Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College London, UK; VERSES AI Research Lab, Los Angeles, CA, 90016, USA
| | - Lancelot Da Costa
- Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College London, UK; VERSES AI Research Lab, Los Angeles, CA, 90016, USA; Department of Mathematics, Imperial College London, UK
| | - Alexander Tschantz
- VERSES AI Research Lab, Los Angeles, CA, 90016, USA; School of Engineering and Informatics, University of Sussex, Brighton, UK.
| | - Alex Kiefer
- VERSES AI Research Lab, Los Angeles, CA, 90016, USA
| | | | - Victorita Neacsu
- Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College London, UK
| | | | - Conor Heins
- VERSES AI Research Lab, Los Angeles, CA, 90016, USA
| | - Noor Sajid
- Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College London, UK
| | - Dimitrije Markovic
- Chair of Cognitive Computational Neuroscience, Technische Universität Dresden, Dresden, Germany
| | - Thomas Parr
- Nuffield Department of Clinical Neurosciences, University of Oxford, UK
| | - Tim Verbelen
- VERSES AI Research Lab, Los Angeles, CA, 90016, USA
| | - Christopher L Buckley
- VERSES AI Research Lab, Los Angeles, CA, 90016, USA; School of Engineering and Informatics, University of Sussex, Brighton, UK
| |
Collapse
|
2
|
Champion T, Grześ M, Bonheme L, Bowman H. Deconstructing Deep Active Inference: A Contrarian Information Gatherer. Neural Comput 2024; 36:2403-2445. [PMID: 39141805 DOI: 10.1162/neco_a_01697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 05/23/2024] [Indexed: 08/16/2024]
Abstract
Active inference is a theory of perception, learning, and decision making that can be applied to neuroscience, robotics, psychology, and machine learning. Recently, intensive research has been taking place to scale up this framework using Monte Carlo tree search and deep learning. The goal of this activity is to solve more complicated tasks using deep active inference. First, we review the existing literature and then progressively build a deep active inference agent as follows: we (1) implement a variational autoencoder (VAE), (2) implement a deep hidden Markov model (HMM), and (3) implement a deep critical hidden Markov model (CHMM). For the CHMM, we implemented two versions, one minimizing expected free energy, CHMM[EFE] and one maximizing rewards, CHMM[reward]. Then we experimented with three different action selection strategies: the ε-greedy algorithm as well as softmax and best action selection. According to our experiments, the models able to solve the dSprites environment are the ones that maximize rewards. On further inspection, we found that the CHMM minimizing expected free energy almost always picks the same action, which makes it unable to solve the dSprites environment. In contrast, the CHMM maximizing reward keeps on selecting all the actions, enabling it to successfully solve the task. The only difference between those two CHMMs is the epistemic value, which aims to make the outputs of the transition and encoder networks as close as possible. Thus, the CHMM minimizing expected free energy repeatedly picks a single action and becomes an expert at predicting the future when selecting this action. This effectively makes the KL divergence between the output of the transition and encoder networks small. Additionally, when selecting the action down the average reward is zero, while for all the other actions, the expected reward will be negative. Therefore, if the CHMM has to stick to a single action to keep the KL divergence small, then the action down is the most rewarding. We also show in simulation that the epistemic value used in deep active inference can behave degenerately and in certain circumstances effectively lose, rather than gain, information. As the agent minimizing EFE is not able to explore its environment, the appropriate formulation of the epistemic value in deep active inference remains an open question.
Collapse
Affiliation(s)
- Théophile Champion
- University of Birmingham, School of Computer Science Birmingham B15 2TT, U.K.
| | - Marek Grześ
- University of Kent, School of Computing Canterbury CT2 7NZ, U.K.
| | - Lisa Bonheme
- University of Kent, School of Computing Canterbury CT2 7NZ, U.K.
| | - Howard Bowman
- University of Birmingham, School of Psychology and Computer Science, Birmingham B15 2TT, U.K
- University College London, Wellcome Centre for Human Neuroimaging (honorary) London WC1N 3AR, U.K.
| |
Collapse
|
3
|
Champion T, Grześ M, Bowman H. Branching Time Active Inference with Bayesian Filtering. Neural Comput 2022; 34:2132-2144. [PMID: 36027722 DOI: 10.1162/neco_a_01529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 05/26/2022] [Indexed: 11/04/2022]
Abstract
Branching time active inference is a framework proposing to look at planning as a form of Bayesian model expansion. Its root can be found in active inference, a neuroscientific framework widely used for brain modeling, as well as in Monte Carlo tree search, a method broadly applied in the reinforcement learning literature. Up to now, the inference of the latent variables was carried out by taking advantage of the flexibility offered by variational message passing, an iterative process that can be understood as sending messages along the edges of a factor graph. In this letter, we harness the efficiency of an alternative method for inference, Bayesian filtering, which does not require the iteration of the update equations until convergence of the variational free energy. Instead, this scheme alternates between two phases: integration of evidence and prediction of future states. Both phases can be performed efficiently, and this provides a forty times speedup over the state of the art.
Collapse
Affiliation(s)
| | - Marek Grześ
- University of Kent, School of Computing, Canterbury CT2 7NZ, U.K.
| | - Howard Bowman
- University of Birmingham, School of Psychology, Birmingham B15 2TT, U.K.,University of Kent, School of Computing, Canterbury CT2 7NZ, U.K.
| |
Collapse
|