Layer 1 - Most Frequent Words

Next: Layers 2 and 3 Up: WordSieve Previous: WordSieve

Layer 1 - Most Frequent Words

The bottom layer (1) performs initial processing on the text stream. Through a competitive learning process, this layer assigns the units in this layer to the most frequently occurring words. In the current implementation, this layer contains 150 units, so it can be sensitive to only 150 unique words at a time. The number of units in this layer is significant, because it is a competitive network. If there are too many units, there is not enough competition and it will not identify the correct words. If there are too few units, the competition is so great that it will not learn all the words needed. The algorithm appears to work well under a reasonably large range of layer sizes, but we have not yet conducted a comprehensive analysis of the effects of changing the layer size.

Words are assigned to units as follows: Each of the 150 units is associated with a unique word. As the documents are read, each term passes through the bottom layer in the order in which it occurs in the document. If the word is already associated with a unit in the layer, the excitement of that unit is increased by a value $\alpha$ (see below). If the word is not associated with any term, it is given a chance to ``take over'' a randomly chosen unit. The term takes over the chosen unit with a probability of 0.0001(e-100)², where e is the excitement of the unit and has a range of 0 to 100. The chance of ``taking over'' the unit thus decreases with the excitement of that unit. Also, the values of all the units decay at a rate $\beta$ .

Next: Layers 2 and 3 Up: WordSieve Previous: WordSieve

Travis Bauer
2002-01-25