Content-addressable memories: what they should do
- When part of a familiar pattern enters the memory system,
the system fills in the missing parts (recall).
- When a familiar pattern enters the memory system, the
response is a stronger version of the input (recognition).
- When an unfamiliar pattern enters the memory system, it is
dampened (unfamiliarity).
- When a pattern similar to a stored pattern enters the memory
system,
the response is a version of the input distorted toward the stored
pattern (assimilation).
- When a number of similar patterns have been stored, the system
responds to the central tendency of the stored patterns, even if
the central tendency itself never appeared (prototype effects).
- Basic properties
- CAM
- Potentially completely recurrent
- Symmetric weights
- Activation rule (θ a threshold, sgn(): 1 if its argument is positive, -1
otherwise):
`x_i(t + 1) = text(sgn)(sum_j w_(ij) x_j(t) - theta_i)`
- Settling: asynchronous, random update
- Training: single presentation of each pattern
- Each training pattern should yield an (fixed-point) attractor.
- Stability
- Lyapunov stability: if there is a function of the network
state which decreases or stays the same as the network is updated, then the
network is asymptotically stable.
- Energy of network (a Lyapunov function):
`E = -half sum_i sum_j w_(ij) x_i x_j`
- For symmetric weights, this can be rewritten as
`E = -sum_i w_(ii) x_i^2 - sum_((ij)) w_(ij) x_i x_j = C - sum_((ij)) w_(ij) x_i x_j`
where (ij) refers to distinct pairs of indices, and C is a constant.
- The activation rule minimizes energy
-
Assuming no thresholds, for a given updated unit i, either its activation is
unchanged, in which case the energy is unchanged,
or it is negated, in which case `x_i` and
`sum_j w_(ij) x_j` have opposite signs, and
`x_i prime = -x_i`, where `x_i prime` is the activation of unit i following
the update.
-
Then the difference between the energy after and before the update of
unit i is
`E prime - E = - sum_(j ne i) w_(ij) x_i prime x_j + sum_(j ne i) w_(ij) x_i x_j`
`= 2 sum_(j ne i) w_(ij) x_i x_j`
`= 2 x_i sum_(j ne i) w_(ij) x_j`
`= 2 x_i sum_j w_(ij) x_j - 2 w_(ii)`
-
But both of these terms are negative, so, for asynchronous updates,
the energy always either remains the same or decreases.
- Learning
- Storing Q memories in a Hopfield net:
`w_(ij) = sum_(p=1)^Q x_i^p x_j^p`
-
Hebbian learning: weight on the connection joining two
units is proportional to the correlation between their activations.
- For one pattern p, we get stability if, for all i, the sign of the input to i is the same as its activation:
`text(sgn)(sum_j^N w_(ij) x_j^p) = x_i^p`
-
The expression in parentheses (the input to unit i) is
`sum_j^N sum_q^Q x_i^q x_j^q x_j^p = sum_j^N x_i^p (x_j^p)^2 + sum_j^N sum_(q ne p)^Q x_i^q x_j^q x_j^p`
`= N x_i^p + sum_j^N sum_(q ne p)^Q x_i^q x_j^q x_j^p`
-
If the magnitude of the second term, the crosstalk term, is
less than N, then the expression has the same sign as `x_i^p`, and pattern p is stable.
- Crosstalk is high (the magnitude is what matters, not the sign) when a given unit tends to be activated by a large number of units in one or more patterns but inhibited by the units in another pattern.
- Capacity of a network
- Crosstalk between patterns limits the number of patterns that can be stored.
- Number of random patterns storable is proportional to N if a small
percentage of errors is tolerated, but it is quite small.
Variations on the Hopfield model
- Continuous activation
- Hidden units
- Time-delay connections