Hetero-associative content-addressable memories
- Hetero-associative content-addressable memory
- For each of one set of patterns ("addresses"), store
an element of another set ("data").
-
Later presented with an input pattern,
find the best-matching address and
retrieve the data stored there.
- Feedforward, hetero-associative networks with fixed input-hidden
weights
- Sparse Distributed Memory (Kanerva, illustrations)
- Input patterns (addresses) and output patterns
(data) are very long binary (-1,+1) vectors (e.g., 1000 bits).
- Number of hidden layer units (M) is the number of hard
memory locations. This is much smaller than the
address space, hence the sparseness.
- The input-to-hidden weights (the address matrix A) are
fixed weights of 0 or 1,
specifying the location in address space of the hard
locations.
- The hidden layer input function d is just the Hamming distance of the input
pattern fromm the hard location.
- The hidden layer activation function sthresholds the hidden layer
inputs at some fixed radius from the memory
locations.
- The hidden-to-output weights (the contents matrix C)
are learned on the basis
of single presentations of target (data) vectors.
For each of the selected hard addresses (hidden-layer units),
the output (data) vector is added to the weights out of that
unit.
- The output layer input function is just the sum of the
weights of all of the activated (selected) hard addresses
(hidden-layer units).
This determines
the overlap between the selected set of memory locations
and the stored data patterns.
- The output layer activation function thresholds these inputs.
- Storing pairs of patterns
- Presenting an address x to the network
- Determining the set of memory locations within the Hamming
hypersphere of x, that is, s
- Adding the data vector to the contents matrix rows for the selected
locations
- Retrieving a data vector y for an address x
- Presenting the address to the network
- Determining the set of memory locations within the Hamming sphere
of the address
- Summing the selected rows in the contents matrix
- Thresholding the result
- Error depends on the extent to which an input pattern (address)
accesses memory locations which are close to stored
addresses other than the right one.
- Why it works: redundant storage allows for retrieval with low
error rates
- Auto-association: the address and data vectors are the same
- Sequences: data vectors are addresses in other associations
- Example
- Training set
| Addresses
| Data
|
| + | – | – | + | + | +
|
| + | + | – | +
|
| – | + | + | + | + | –
|
| + | – | + | –
|
| – | + | – | + | – | +
|
| – | – | + | +
|
- Hidden layer (hard locations)
| + | – | + | + | + | –
|
| + | + | – | + | – | +
|
| – | + | + | – | + | –
|
-
Storing address-datum pairs
| + | – | – | + | + | +
|
| Distance
|
|
| + | – | + | + | + | –
|
| 2 *
|
| + | + | – | + | – | +
|
| 2 *
|
| – | + | + | – | + | –
|
| 5
|
| Contents matrix
|
| 1 | 1 | –1 | 1
|
| 1 | 1 | –1 | 1
|
| 0 | 0 | 0 | 0
|
- Retrieval of data vector given address vector
| + | + | + | + | + | –
|
| Distance
|
|
| + | – | + | + | + | –
|
| 1 *
|
| + | + | – | + | – | +
|
| 3
|
| – | + | + | – | + | –
|
| 2 *
|
|
| Contents matrix
|
| →
| 2 | 0 | 0 | 0
|
|
| 0 | 0 | 0 | 2
|
| →
| 1 | –1 | 1 | –1
|
|
|
| 3 | –1 | 1 | –1
|
|
| + | – | + | –
|
Radial basis function networks (Wikipedia)
- Tasks: function approximation, time series prediction, etc.
- Architecture
- Feedforward network
- Non-linear hidden layer of radial basis function units; activation:
`rho_i = e^(-beta_i |\|vec x - vec c_i|\|^2)`
`vec x` an input vector; `beta_i` and `c_i` width and center parameters for the radial basis functions
-
Hidden units units are approximately local; changing the parameters for one has little effect on the behavior of the others
- Linear output layer
`y_j = sum_i w_ji rho_i`
- Learning
- Radial basis function parameters selected
- Centers (`c_i`) selected on the basis of a sampling of inputs: either a random subset of the inputs or the centers of clusters of inputs
- Widths (`beta_i`) usually fixed to the same value; proportional to the maximum distance between the chosen centers
- Hidden-to-output weights trained
- Gradient descent on output error
`Delta w_ji = eta (t_j - y_j) rho_i`
An applet that illustrates RBF networks