next up previous contents
Next: Examining the Issues Up: Designing a System Previous: Known Space Ferrets and

Webpage Beauty Contests

Each webpage can be thought of as voting on the importance of all other webpages based on whether it links to them, or is linked to by them, or not. More generally, one page can be thought of as voting for another depending on whether the two pages have anything in common (proper names, keyword frequencies, and so on). In this way, Known Space is trying to find suitable contestants for a beauty contest that is being continuously judged by the webpages the user has already thought enough of to make them home documents.

Known Space can use lots of ways of estimating webpage interestingness: documents the user finds interesting either because they're in the user's webhome, areas the user has explicitly told the system are of interest, pages that people the user admires find interesting, and all the neighbors of those pages. Also, ``neighbors'' can be interpreted not just in terms of webpages, but in terms of other things too: for example, if a user expresses interest in whales, Known Space could interpret that as (some) interest in mammals, in sea creatures, in whaling, and so on (of course, to do so Known Space would first have to have a semantic net linking all these topics). Naturally, not all of these topics should be searched for since we're then back to the original problem of sifting wheat from chaff. Instead, Known Space should look for reinforcements of interest. So if at one time it decides that the user is interested in ships then that should activate whaling just a little. If it later deduces an interest in whales, that should increase the interestingness of whaling, and so on.

Also, any website that contains many interesting documents (however measured) should automatically become interesting itself. By extension, any sites that refer to that site, also become interesting (although less so), and so on.

Known Space is essentially building a semantic network of topics the user might find interesting. This is a web within the world wide web dedicated to one particular user's interests.

Here are some obvious characteristics of a webpage:

Known Space might be able to ``explain'' its choice of interestingness for a webpage by displaying the set of activations along all these dimensions (arrayed in a two-dimensional map). Further, it can take that map and create a fingerprint to test for other pages like that page should it prove to be very popular with the user.

This classification scheme is like judging scientists for the Nobel Prize (who thinks they're good? what's their track record?). It's also like going to the library to find something interesting to read. And it's like judging other people (who do I know who vouches for this person in whatever implicit way?)

Since Known Space will inevitably make classification mistakes, it should be user-modifiable so that things that aren't close in keyword confluence, or any of the other automated measures used, but which the user feels are close in some semantic sense can be moved closer together. Further, the user should be able to blacklist a site and have that blacklisting appear in the user's local neighborhood map. Finally, the user should be able to exchange linkage information with other users so that the information mapping ability of each user in the group is magnified to the mapping ability of the entire group (in other words, users should have automatic word-of-mouth recommendations about new sites and semantic linkages). Localization, blacklisting, and information sharing all cut down on the cognitive effort of remembering long lists of unrelated things. Were we not creatures with extremely limited memories, all books would be one long sentence.


next up previous contents
Next: Examining the Issues Up: Designing a System Previous: Known Space Ferrets and
Gregory J. E. Rawlins
1/13/1998