Reinforcement learning

Markov decision processes

Simple reinforcement learning

Q learning

An example

γ = 0.8, η = 0.5 and all Q values initialized at 0. In the chart, "new" means the reinforcement received plus the discounted maximum value of the next state. The "new" value is combined with the "old" using the learning rate to give the updated Q value appearing in the next line of the chart. (Note: in this example, in order to illustrate how the agent can learn to "look ahead", it is effectively picked up after it reaches the goal state and dropped back in state 1. There is no "natural" way of reaching state 1 from state 4.)

x Q new u
1,r 2,r 2,l 3,r 3,l 4,l
1 0 0 0 0 0 0 0 r
2 0 0 0 0 0 0 0 r
3 0 0 0 0 0 0 1 r
4 0 0 0 .5 0 0 0 l
1 0 0 0 .5 0 0 0 r
2 0 0 0 .5 0 0 .4 r
3 0 .2 0 .5 0 0 1 r
4 0 .2 0 .75 0 0 0 l

Making decisions

Implementing Q learning

Home

Calendar

Coursework & grading

Assignments

Lecture notes

Other resources


IU home

IU CS home

Contact instructor