Basic features of distributed (PDP) neural network models
- Control is distributed
- Representation
- Primitives
- Simple processing units that take a vector of numerical
inputs and yield a single numerical output
- Vectors of units
- Complex; means of combination
- An important constraint: with some exceptions, the system is not allowed to grow
- Complex representations are also vectors
- Combination is through superposition rather than concatenation
- Rules, if they exist at all, are represented implicitly or can
be seen as emergent behavior
- Learning mechanism(s) and long-term memory
- LTM consists of weights on connections joining units
- Various learning mechanisms, but learning is
always similarity-based and usually correlational
- Processing mechanism(s) and short-term memory
- STM is activation of units
- Means of accessing appropriate knowledge in LTM: parallel spread
of activation across weighted connections and (sometimes) settling (repeated adjustment of activations until none change)
- Means of applying LT knowledge (inference): given an input in the form of a pattern of activation across some units, other units are activated via weights
A neural network model
- State
- Vector of activations x(t)
- Matrix of weights W
- Task
- Set of input vectors I(t), possibly infinite,
clamped at the beginning of a presentation
- (Sometimes) an associated set of target vectors T(t)
- Dynamics
- Discrete (difference equations) or continuous (differential
equations)
- Activation
x(t+1) = g(h(x(t),
W(t), I(t)))
g the activation function,
h the input function
- Weight
W(t+1) = f(x(t),
W(t), I(t), T(t)))
f the learning rule
Dimensions along which models vary
- Are input and output units separated in the network?
Does the network have a hidden layer?
- Is the network feedforward or does it settle to a stable state
as it runs?
- Feedforward networks
- Recurrent connections, settling (attractor) networks
- Simple recurrent (Elman) networks
- Is the network supervised or unsupervised?
Running a neural network
- On a given task trial, an input vector (a pattern from the training set)
is clamped on a layer of input units.
- Activation is propagated through the network from the input units to the output units, if any, or throughout the unclamped nodes otherwise.
- Each unit calculates the input into it from the units that feed it.
The usual input function (h) is the dot product of the activations of the feeding (source)
units and the weights connecting them to the destination unit, that is,
hj = ∑i xi wji
- The input also includes a bias term, reflecting the tendency for the unit
to be activated independent of its input from other units.
This can be implemented as a distinguished input unit which always has activation 1.0.
- The unit calculates its activation (output) as a function of its input.
Usually the activation function (f) "squashes" the input into a fixed range, either [-1, 1] or [0, 1].
Some possibilities:
- A step (threshold) function, for example,
f(hi) = 0 if hi is negative, 1 otherwise
- The sigmoidal function:
f(hi) = 1 / (1 + e-hi)

The bipolar version of the sigmoidal function:
f(hi) = tanh(hi) =
(ehi - e-hi) /
(ehi + e-hi)
- Units are updated until the output layer is reached, if any, or until the activations are
stable.
Five challenges for neural networks
- The binding problem: how can a neural network represent how features go together as aspects of different "objects"?
- The relation problem: how can a neural network represent the relation between objects that are present simultaneously?
- The variable problem: how can a neural network represent the notion of "sameness"?
- The structure problem: how can a neural network represent structure (part-whole relationships, embedding, recursion)?
- The short-term memory problem: how can a neural network respond to inputs that happened in the recent past as well as to the current input?