next up previous
Next: The WordSieve Architecture Up: WordSieve: A Method for Extraction Previous: Content or Context?

The WordSieve Algorithm for Context Extraction

WordSieve's goal is to find terms associated with document access patterns. The WordSieve algorithm finds groupings of documents which tend to be accessed together, and indexes documents according to frequently occurring terms which also partition the documents. Our hypothesis is that these terms are good indicators of task context. We evaluate this hypothesis in section five.

Because WordSieve automatically extracts terms associated with sets of document accesses, rather than using explicit task descriptions, WordSieve does not require a user to specify when one task is finished and another has begun. Thus there is no need for the user to artificially limit browsing behavior to provide task-related information (which a user would be unlikely to do in practice).

WordSieve's context representation and its system design reflect several constraints affecting real-time information retrieval agents that assist users as they perform other tasks:

1.
The system must be relatively compact and should consume only limited resources. It should not, for example, require storing and re-processing previously accessed documents, and consequently must accumulate its contextual information along the way.
2.
The system must run in real time. It must make its suggestions while the user is performing the task for which they are relevant.
3.
The system should develop a user profile, reflecting the access patterns of the particular user, in order to provide personalized recommendations likely to be useful for that user.
4.
The system should be able to use the user profile to produce a context profile when the user is accessing documents, reflecting both the user and the current task.


next up previous
Next: The WordSieve Architecture Up: WordSieve: A Method for Extraction Previous: Content or Context?
Travis Bauer
2002-01-25