Neural Networks in SALSA
- Network of interconnected processing units.
Each connection in the
network has an associated weight.
- Each unit has an activation,
which varies relatively rapidly.
The pattern of activation across the network usually represents
the network's short-term memory.
- Some of the units are designated input units.
These are clamped
to particular activation
values when the network is presented an input pattern.
Some of the units may also be designated output units; their activations
represent the network's response.
Some of the units may also be designated
These are units which are neither input nor output.
They do not make contact with the environment and function as
a place for the re-representation of input patterns, usually on the
way to the output units.
- Each unit (unless clamped) repeatedly updates its activation.
It first calculates its current input from all of the other units that
it is connected to.
This is the sum of the inputs from each other unit.
The unit then calculates its new activation, based on the input, and
possibly its old activation.
- Weights in the network are changed relatively slowly. They represent
the network's long-term memory.
The way weights change depends on the network's learning algorithm.
Weight changes are usually Hebbian;
that is, they are proportional to the
activations of the units on either end of the connection.
In error-driven learning (for
the weight changes are based on the network's
function of the difference between its output for a given
input pattern and a target,
what the network should have output for that pattern.
- The network is trained on the basis of a set of training patterns
(inputs and sometimes targets).
Later it may respond appropriately to novel input patterns; that is,
it may generalize.
Neural Networks for Reinforcement Learning
We can replace the lookup table for Q-learning with a neural network.
Each state is represented by a pattern over the input units; each
input unit represents a sensory input feature.
Each action is represented by a single output unit.
The network's stored Q-values for the actions associated with a
given state are found by activating the input units with the
pattern representing that state and observing the activations
of the output units.
Each output unit's activation represents the Q-value for the
corresponding action in the given state.
The Q-values are actually stored in the network's weights.
During Q-learning, the agent selects an action in the current state,
makes a response, and receives a reinforcement from the
A new Q-value for the state-action pair is calculated on the basis of
the reinforcement and the stored maximum Q-value for the next state
(found by running the network) using the first update equation
This value is treated as the
target for the output unit corresponding to the selected
action, and the weights in the network are updated using error-driven
learning (the error is the difference between the target and the
is a number between 0 and 1, corresponding to eta in the second update equation.
The learning rate controls the rate at which the weights are updated.