Action Selection and Operant Conditioning:  A Neurorobotic Implementation

<div>Graphics of the neural spikes and the STDP rule factors occurring over 24000 cycles. The first five (graphics A to E) concern the green block and green LED. The middle five capture the logic of the yellow block and the yellow LED. The last group of five is associated with the red color. Around cycle 8500, the robot has learned to perform the right action with the appropriate cue. This means that, upon triggering an action, if the predictor spikes from a following reward, the STDP coefficient will increase, boosting the synaptic weight between the sensor and the predictor. Once this synaptic weight reaches a critical threshold, the sensor input will trigger a spike to the reinforced predictor neuron, without the need of the reward.</div>

Journal of Robotics

fig7

Figure 7

Figure 7: Action Selection and Operant Conditioning:  A Neurorobotic Implementation