Local(ized) representations
- Local(ized) representations associate a single representational element to each of the things that are to be represented
- In symbolic models, symbols are local representations.
- In neural networks, we can create local representations by assigning a separate unit to each thing within the layer of units dedicated to the class of things being represented.
-
Local representations are orthogonal to one another.
-
The system's behavior in response to one local representation is completely independent of its behavior in response to another local representation.
-
Local representations do not support generalization.
Distributed representations
- With distributed representations, representational elements are shared across different things being represented.
- Each thing is represented with more than one representational element, and
each representational element takes part in the representation of more than one thing.
- In symbolic models, symbol structures can be seen as distributed representations:
each symbol structure is represented in terms of the primitive symbols it is made up of.
- In neural networks, we can create distributed representations by assigning multiple units to each thing being represented.
- The system responds to a distributed representation on the basis of its overlap with other distributed representations.
-
Distributed representations permit generalization.
- Coarse-coding: representation of inputs along one or more dimensions by feature detectors with overlapping receptive fields.
- An illustration
- Generalization depends on the size and shape of the feature detectors; acuity depends on the number of feature detectors (not their size).
- Coarse-coding permits generalization about location in input space.
- Emergent distributed representations on the hidden layer of trained neural network
- Disadvantages of hidden-layer distributed representations: catastrophic forgetting
Examples
- Visual input with 3 pixels, 5 colors (red, green, blue, yellow, black)
- Local representation: a single element for each possible visual input (53 elements); no generalization to novel inputs
-
Single representational element for each combination of pixel and color, 5 X 3 = 15 elements in all
- Generalization to novel inputs on the basis of similarity to trained inputs
- Presented with novel input RED - RED - BLACK, system would respond on basis of
previous training with RED - GREEN - BLACK and BLUE - RED - BLACK
- But are these similarities the "right" ones to base generalization on?
- Color represented locally within each pixel
- 3 elements to represent each color (RGB), 3 X 3 = 9 elements in all
- Yellow in a given pixel: R and G elements turned on; yellow always treated as similar to green and red.
- YELLOW - YELLOW - BLACK more similar to GREEN - RED - BLACK
than to BLUE - BLUE - BLACK
- Is this the "right" kind of generalization?
- Counting objects in a visual scene
- Treat input space as direct representation of visual space: turn on all elements corresponding to objects in the scene
- A distributed representation of scenes because for each scene
multiple elements are turned on
-
Local representation scheme would assign a single representational
element to each different visual scene, an admittedly absurd way to
represent the input
- Difficult to learn to count because of the invariance problem
- Size invariance: objects of different sizes will have different units turned on.
- Position invariance: objects in different positions may have completely different units turned on.
- Preprocessing: transform raw inputs into patterns in which a single unit at the center of mass is turned on for each object (more local, but still distributed representation)
- Representing the numbers that are the output of the network
- Local scheme: single output unit for each number
- Problem: number of possible numbers limited by number of units
- One distributed scheme: represent numbers in binary fashion, with each output unit representing a power of 2 (FOUR is 0 1 0 0, SEVEN is 0 1 1 1)
- Still a limit on numbers that could be represented but more representable with fewer elements
- But network would be trained to associate input patterns with features
of numbers (e.g., inputs with 2, 3, 6, 7, 10, 11, 14, and 15 objects would all turn on the element representing 2)
- Is this the "right" kind of generalization?
- Another distributed scheme: a "thermometer" encoding in which each
output unit turns on for all numbers equal to or greater than some value
- Numbers similar in size have similar output patterns