Hoang H, Tsutsumi S, Matsuzaki M, Kano M, Toyama K, Kitamura K, Kawato M. Predictive reward-prediction errors of climbing fiber inputs integrate modular reinforcement learning with supervised learning.
PLoS Comput Biol 2025;
21:e1012899. [PMID:
40096178 PMCID:
PMC11957396 DOI:
10.1371/journal.pcbi.1012899]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Revised: 03/31/2025] [Accepted: 02/21/2025] [Indexed: 03/19/2025] Open
Abstract
Although the cerebellum is typically associated with supervised learning algorithms, it also exhibits extensive involvement in reward processing. In this study, we investigated the cerebellum's role in executing reinforcement learning algorithms, with a particular emphasis on essential reward-prediction errors. We employed the Q-learning model to accurately reproduce the licking responses of mice in a Go/No-go auditory-discrimination task. This method enabled the calculation of reinforcement learning variables, such as reward, predicted reward, and reward-prediction errors in each learning trial. Through tensor component analysis of two-photon Ca2+ imaging data from more than 6,000 Purkinje cells, we found that climbing fiber inputs of the two distinct components, which were specifically activated during Go and No-go cues in the learning process, showed an inverse relationship with predictive reward-prediction errors. Assuming bidirectional parallel-fiber Purkinje-cell synaptic plasticity, we constructed a cerebellar neural-network model with 5,000 spiking neurons of granule cells, Purkinje cells, cerebellar nuclei neurons, and inferior olive neurons. The network model qualitatively reproduced distinct changes in licking behaviors, climbing-fiber firing rates, and their synchronization during discrimination learning separately for Go/No-go conditions. We found that Purkinje cells in the two components could develop specific motor commands for their respective auditory cues, guided by the predictive reward-prediction errors from their climbing fiber inputs. These results indicate a possible role of context-specific actors in modular reinforcement learning, integrating with cerebellar supervised learning capabilities.
Collapse