Contrastive Hebbian learning
- Boltzmann machines (Hinton & Sejnowski)
- Generalized Hopfield models with stochastic binary units
`p(x_i = 1) = 1 / (1 + e^(-(b_i + vec x * vec w_i))`
- Simulated annealing may be used to speed up settling
- Learning algorithm
- During the "positive" phase, clamp the input and output units, and let the hidden units settle; do Hebbian learning
- During the "negative" phase, clamp the input units only, and let the hidden and output units settle; do anti-Hebbian learning
- The learning rule: `Delta w_(ij) = eta (hat x_i^(+) hat x_j^(+) - hat x_i^(-) hat x_j^(-))`
(the "^" means the activation following settling)
- The algorithm rewards correlations that agree with the data, punishes correlations supporting "fantasies" that differ from the data
- When positive and negative phase activations agree, weights no longer change
- But learning is very slow
- Contrastive Hebbian learning proper: the continuous analogue of Boltzmann machines
- Hetero-associative: unclamp the output units during the negative phase
- Auto-associative: unclamp all of the units during the negative phase
-
CHL and negative evidence
- The role of positive and negative evidece in learning:
Positive evidence allows learners to create more general
"hypotheses", including more possible patterns.
Negative evidence constrains the hypotheses, excluding
bad patterns.
- But negative evidence may not be available (for example, in
most language learning).
What prevents the learner from generating overly general hypotheses
that include bad patterns?
For language learners, various innate constraints have been proposed to
solve the problem.
- The negative phase in auto-associative CHL can be seen as a way of generating
candidate patterns and rejecting them, that is, a way of internally
generating negative evidence.
To the extent that a particular bad pattern is likely during the
negative phase, the network can learn not to include it in its
hypothesis.
Generative models (Hinton)
- Discriminative vs. generative models
- Discriminative models are designed to assign data to a small number of classes
- Generative models are designed to create a model of the data that could reproduce it
- "Visible" units where data are presented
- Hidden units which are responsible for learning feature detectors or for representing "causes" of the data (as in belief networks)
- The wake-sleep algorithm
- Separate recognition (bottom-up) and generative (top-down) weights
- During the "wake" phase, recognition weights drive the activations;
generative weights are trained with the delta rule (targets are hidden or input activations)
- During the "sleep" phase, generative weights drive the activations;
recognition weights are trained with the delta rule