Computer model explains how animals select actions with rewarding outcomes

Scientists from the universities of Manchester and Sheffield have developed a computer model charting what happens in the brain when an action is chosen that leads to a reward.

The model could provide new insights into the mechanisms behind motor disorders such as Parkinson’s Disease. It may also shed light on conditions involving abnormal learning, such as addiction.

Dr Mark Humphries from The University of Manchester explains the research: “We wanted to look at how we learn from feedback – particularly how we learn to associate actions to new unexpected outcomes. To do this we created a series of computational models to show how the firing of dopamine neurons caused by receiving reward ultimately translates into selecting the causative action more frequently in the future.”

Learning to associate rewarding outcomes with specific actions is a key part of survival, for example searching for food or avoiding predators. It is already known that actions are represented in the cortex—the brain’s outer layer of neural tissue—and rewarding outcomes activate neurons that release a brain chemical called dopamine.

These neuronal signals are sent to another area of the brain, the striatum – the input station for a collection of brain structures called the basal ganglia - which plays an important role in selecting which action to take.

Collectively, this evidence suggests that dopamine signals change the strength of connections between cortical and striatal neurons, thereby determining which action is appropriate for a specific set of circumstances. But until now, no model had integrated these strands of evidence to test this.

Dr Humphries explains why they created the model: “Essentially within this area of research we are tackling a puzzle in which we have an unknown number of pieces and no picture to guide us. Some pieces have been intensively studied individually, so the questions were: could we put the pieces of the puzzle together and prove that they made a coherent picture? And could we guess at the missing pieces? The only way to build the puzzle from the individual pieces was through using a computational model, which allows us to do things impossible in experiments - not least, provide solutions and guesses for the unknown, missing pieces.”

Their model revealed how several brain signals work together to shape the inputs from the cortex to the basal ganglia so the appropriate action is chosen.

Professor Kevin Gurney says: “The computational framework works across several scales of description, linking data on plastic change at single synapses between cortex and striatum, with behavioural data on learning the association between actions and outcomes. The model reveals that the relative strength of cortical inputs, which represent different possible actions, to the two populations of dopamine responsive cells, determines whether an action is selected or suppressed.”

He continues: “Moreover, the correct timing of neuronal activity, the type of dopamine responsive cells, and dopamine level are necessary for generating neuronal activity patterns that result in successful learning.”

Dr Humphries concludes: “The fact that the pieces of our puzzle all fitted together to produce a single coherent picture is evidence that we (as a field) are converging on a complete theory for how the brain learns from reward.”

This study provides strong support for the hypothesis that cortical inputs to neurons in the striatum are crucial for learning the association between action and outcome.

Moving forward, the model provides a common framework in which to place new findings on all aspects of learning from outcomes. In the clinical realm, it could also reveal novel insights into the mechanisms behind motor disorders and shed light on abnormal learning related to conditions such as addiction, where the association becomes so strong that the action is repeatedly chosen even when it is not appropriate to do so. The striatum is the focus of much addiction research precisely because of the proposed role in learning this association, and because so many drugs, for example cocaine, interfere with the dopamine system.

Notes for editors

The paper “New framework for cotico-striatal plasticity: behavioural theory meets in vitro data at the reinforcement-action interface” has been published in PLOS Biology.

The research was carried out jointly between Professors Kevin Gurney and Peter Redgrave of the University of Sheffield and Dr Mark Humphries of The University of Manchester.

For interview requests please contact:

Morwenna Grills
Media Relations Officer
The University of Manchester

Tel: +44 (0)161 275 2111
Mob: +44 (0)7920 087466
Email: Morwenna.Grills@manchester.ac.uk