Instead we usually move relatively slowly in the direction indicated by the
current evidence; that is, there is a learning rate (η), which controls the step
size of the learning.
With this, the update equation is:
`Q^{t+1}(s_t, a_t) = (1-eta) Q^t(s_t,a_t) + eta [r(s_t, a_t) + gamma max_{a_(t+1)} Q^t(s_{t+1}, a_{t+1})]`
This equation combines the old knowledge that the agent has with
the new information coming from the current experience of receiving a
reinforcement and ending up in a new state.