Networks in which location of output unit conveys
information
Input patterns can be ordered (on one or more dimensions) in some
metric or topological way with respect to features that are
implicit in the input
Output units have fixed positions in one-, two-, or
three-dimensional grids
Topology preserving map from the space of possible inputs to the
line, plane, or cube of the output units
A mapping that preserves neighborhood relations.
As two input patterns get closer in input space, the winning
output units get closer in output space.
Feature mapping architectures
Input units (like output units) are arranged in a one-, two-, or
three-dimensional array.
Dimensionality of the input and output spaces is normally the same.
The problem is to learn a continuous mapping between points "above"
each other in the two spaces.
Retinotopic and somatosensory maps (two-dimensional), tonotopic map
(one-dimensional).
Input units are continuous-valued; their outputs define
their positions in input space.
Often the dimensionality of the input and output spaces is different.
Kohonen networks
Competitive learning network with no dead units.
Output neighborhoods
The neighborhood relations in the output array are built into the
learning rule.
Network as an elastic net in which the weight vector of the
winner is dragged toward the input vector and the weight
vectors of neighboring units are pulled along with it.
Nearby units respond to nearby input patterns.
Winning output unit j* (standard competitive learning rule):
`|vec w_{j text{*}} - vec x| le |vec w_{j} - vec x|` (for all j)
Both the parameter affecting the size of the neighborhood window (r or σ) and η start large and are decreased during training; there is a third parameter controlling the decay of these
two parameters.
1-to-1, 2-to-1, 2-to-2 mappings; square and hexagonal cells in two-dimensional output space
Gaps around the boundaries of input regions
Learning rule is sensitive to probability of inputs as well as their
location in input space: more output units are associated with regions of higher probability
Convergence of Kohonen nets
Usually in two stages: (1) untangling, (2) detailed adapting
Kinds of tangles: twists (2 dimensions), kinks (1 dimension)
In one dimension
Boundaries (kinks) can move one way or another when they are near the
winning unit, but only one step at a time.
For untangling to be complete, a kink must move to the edge.
With a symmetric neighborhood function, this takes
on the order of amath N^3 endamath updates.
Asymmetric neighborhood function can speed up learning.
Monotonically increasing or decreasing sequences of weights remain so at each update
Problem of searching the whole output space for the winner
Several hierarchical layers, each responsible for the same input patterns
One layer trained at a time, starting from smallest layer
For each layer, the winner is only searched for within the neighborhood defined
by the winner in smaller layer below this
Supervised Kohonen nets
A set of teaching units is included along with the inputs during
training
During testing, the weights from the teaching units are eliminated
Kohonen nets: applications
Phoneme similarity
US Congress voting patterns
Traveling Salesman Problem
Associating separate maps: modeling bilingual language learning (Li and Farkaš)
Separate phonological and semantic maps trained on data from a child learning English and Cantonese
Corresponding regions in the two maps associated via Hebbian learning
Regions in maps correspond to the two languages and parts of speech (in the semantic map)
Priming one map from the other results in errors similar to those made by bilingual learners
Function approximation; example: pole balancing
Three inputs: amath theta endamath; amath (d theta)/(d t) endamath; and the force required to balance the pole,
amath f(theta) = alpha sin theta + beta (d theta)/(d t) endamath
Two-dimensional output layer trained on input triples
Trained network acts as lookup table; given amath theta endamath and amath (d theta)/(d t) endamath, the third weight into the winning unit is the value of the function.
If parameters in the system change, the network behaves like an adaptive table.
Inverse kinematics
Arm with two hinge joints which can reach all of the positions on a square table
Square grid of Kohonen net output layer associated with positions on the table
Inputs to network are joint angles
Arm placed at random positions on the table; winning unit is one corresponding to table position,
not one with weight vector closest to input joint angles
Trained network can solve inverse kinematics problem: given a position on the table, a unit is selected, and its weights (the joint angles) are read off
Given a path connecting two points on the table, the joint angles can be read off from the units in the grid along the path.
Trained with an obstacle on the table, the network learns to "avoid" it: units near the obstacle
are pushed away from it in input space. The trained network avoids the object when planning a path connecting points on either side of it.
Separate tree-structured SOMs for image features (color, texture, etc.) trained on a database of images;
each node in each layer associated with image(s) closest to its weight vector
User presented with a small set of thumbnail images, selects subset
Regions in maps for selected and rejected images are assigned positive or negative values,
and a convolution mask is applied to the maps based on these values
New, previously unseen images from positive regions of maps presented to user
Retinotopic maps
Positions of the units in output space are not represented explicitly
as in Kohonen nets, but implicitly through their connections to the input
units and/or each other.
Two ways to achieve the mapping
Built-in receptive fields connecting layers of units
Multiple layers of units, with each unit in a given layer (say, M) connected to
multiple units "below" it in the lower (more peripheral) layer (L)
The receptive fields of neighboring units overlap: a single L unit
may connect to multiple M units
The density of connections into a higher cell decreases with distance from the underlying point in the lower layer.
Activation of unit at higher layer a linear function of lower-layer activations and weights connecting them:
`x_j = alpha + sum_i w_(ji) x_i`
Unsupervised Hebbian learning refines the connections
`Delta w_{ji} = ax_i x_j + bx_i + cx_j + d`
with weights constrained to fall within the range [-β,+β]
Multilayer network presented with white noise at the input layer
Sequence of feature-analyzing unit types emerges as one layer after another "matures"
In layer B, weights tend to reach their maximum positive value;
units respond to average of receptive field, with neighboring activations correlated,
the layer's activation represents a blurred image of random snow
In layer C (and given particular parameters, subsequent layers), on-center, off-surround units (contrast-sensitive filters) emerge,
some responding maximally to brightness surrounded by darkness, others the reverse; the "Mexican hat function"
In higher layers, given particular parameter values, orientation-selective units emerge, responding to bright bar or edge against dark background, or the reverse
Connections within layers implement the "Mexican hat function" (on-center, off-surround
connections).
Threshold on outputs; only a few fire for a given input pattern.
Weights trained with a Hebbian rule
`Delta w_{ji} prop x_i x_j`
To prevent weights from blowing up, weights into output units are normalized.
Adjacent activated regions in lower layer tend to activate adjacent regions in upper layer;
separated activated regions in lower layer tend not to activate adjacent regions in upper layer.
At any point during training, a few random inputs are turned on.
Neighboring input units will be activated more highly (because of their excitatory connections).
Neighboring input units will tend to become associated with neighboring output units
because neighboring output units reinforce one another.
As many neighboring regions of input space become associated with neighboring regions of output space, the result is an ordered topographic map.
But there are 8 possible orientations of the mapping.
Problem of local maxima: incompatible partial maps
Solution: Restrict initial development to one region, from which a continuous mapping then spreads out over both surfaces
Problem of how to get the final map to be oriented in the "correct" way
Polarity marker: initial region in input space that has strong connections to a region in output space in the correct orientation (not necessarily the final correct region in output space)
Trained on elongated bar-like patterns, centered on the middle of
the image, and presented in different orientations, learning converges
on a mapping from angle of orientation to position in the output
layer.