Next: The Known Space Interface Up: Designing a System Previous: Designing a System

Building a Known Space Engine

The knowledge discovery and mapping problem, whether solved by a machine or a person (or both), can be divided into four parts:

Finding new information
Evaluating new information
Linking new information into old information and deleting old unused information
Displaying linked information and interacting with the user

To solve this problem, a useful semi-autonomous webmapmaker needs at least four parts:

A collection of ferrets to go out and find new webpages likely to be of interest to the user.
A collection of probabilistic filters that apply various relevancy tests to whatever webpages the ferrets retrieve and prevent the addition of less relevant webpages.
A mapmaker that inserts ferreted webpages that the filters approve into the user's current Known Space, and marks webpages for archival and deletion.
An interface handler that manages the displayed map of Known Space and the user's interactions with it.

At present, the first two parts are separated for design purposes only. Today's webagents only appear to roam the web but in fact sit still on their user's machine, so it's currently reasonable to combine ferrets and filters into one set of information discoverers. However, when distributed multi-platform agents become widespread, automated information discoverers could be actually roaming the web, moving from website to website and doing computations at each site. At that point, program size will become an issue and small, fast ferrets will likely start to separate from large, slow filters.

Eventually a user's Known Space might grow so large that the user needs a map to find things on the map. To this end, users need to be able to lump whole sections of the map into one cluster (say ``Entertainment'' or ``Research'' or ``Business Opportunities'') and have that cluster open to them just as any other document does. Depending on the complexity and variety of the user's interests, this hierarchy could be extended so that the document enclosure progression might go: documents, clusters, rooms, buildings, neighborhoods, strips, districts, boroughs, towns, cities, states, countries, worlds.

Once enough documents are clumped together in a particular document cluster and the user is no longer occupying the communications line, the user's Known Space should automatically create a ferret to fetch the pages in that cluster, analyze them for keyword content, and make a fingerprint function that roughly distinguishes those documents from all the others presently in the map. Another ferret can now start roaming the web using all the webpages in that particular cluster as starting points. It then downloads, using a breadth-first search, all the pages those starter pages point to. (Perhaps the ferret even initiates search requests to the commercial search engines to generate new pages that might be related to the starter pages.) As those new pages start coming in, a filterer analyzes them to find which of the precomputed fingerprints best matches the new pages. That gives enough information to the mapmaker for it to place the new pages in roughly correct locations in Known Space (perhaps marking them in some distinctive color to visually show that they're new).

Once this automated cascade has completed, the entire system shuts down to avoid combinatorial explosion. Some time later, the user comes to the bookmarks file and browses some of the new pages, deleting some and keeping others. Those that have been deleted then become ``anti-pages'' which go for analysis to the filterers who try to determine what it is about them that makes them non-interesting, and the accepted pages go to the ferrets for them to try to determine what about them makes them interesting. Then the ferrets create fingerprints for those types of pages and the whole cycle repeats.

Next: The Known Space Interface Up: Designing a System Previous: Designing a System

Gregory J. E. Rawlins
1/13/1998