evaluating pages


Each page can be thought of as voting for another page depending on whether the two pages have anything in common (proper names, keyword frequencies, same creation date, direct linkage, and so on). All the pages are being continuously judged by the pages the user already values. In this way, the system is trying to find suitable beauty contestants not by asking outside judges, but by asking the candidates themselves. This is the way scientists are chosen for the Nobel Prize.

The system can use lots of ways of estimating whether its user finds a page interesting. For example, pages the user bookmarks, pages the user looks at often, pages the user has explicitly told the system are of interest, pages that contain things the user has explicitly told the system are of interest, pages that people that the user admires find interesting, pages similar in some respect to any of the above sets of pages, and all the neighbors of all those pages.

Also, "neighbors" can be interpreted not just in terms of pages, but in terms of other things too: for example, if a user expresses interest in whales, the system could interpret that as (some) interest in mammals, in sea creatures, in whaling, and so on. Of course, to do so the system would first have to build a semantic net linking all these topics.

Naturally, not all of the related topics should be searched for since we're then back to the original problem of sifting wheat from chaff. Instead, the system should look for reinforcements of interest. So if at one time it decides that the user is interested in ships then that should activate whaling just a little. If it later deduces an interest in whales, that should increase the interestingness of whaling, and so on.

Also, any website that contains many interesting pages (however measured) should automatically become interesting itself. By extension, any sites that refer to that site, also become interesting (although less so), and so on.

The system is essentially building a semantic network of topics the user might find interesting. This is a web within the world wide web dedicated to one particular user's interests.

Here are some characteristics specific to webpages:

* What language is it in?
* Do more pages point to it than it points to? (such a measure is an estimate of how seminal the page is)
* How hard is it to download? (a measure either of the page's popularity or of its home's instability)
* Who wrote it?
* Who points to it?
* Who does it point to?
* How popular is it? (with the home pages, with other pages the user values, with other users)
* How many other pages point to it?
* How many pages does it point to?
* How dense are the connections among its neighbors?
* How far away is it? (measured in terms of shortest linkage distance from a home page)
* What is it about?
* What kind of site is its home site? (topic (if it's prose), language, ftp site, gopher site, usenet article or archive, pictures, sounds, or videos)
* What is it related to? (to the user, to others the user values, to everyone)
* What names appear in it? (the user's name, other personal names, software names, company names, country names, and so on)
* Where is it? (included in a site important to the user, others the user values, everyone)

The system might even be able to "explain" its choice of interestingness for a page by displaying the set of activations along all of these dimensions (arrayed in a two-dimensional map). Further, it can take that map and create a fingerprint to look for other pages like that page should it prove to be very popular with the user.

Since the system will inevitably make classification mistakes, it should be user-modifiable so that pages that aren't close in keyword confluence, or any of the other automated measures used, but which the user feels are close in some semantic sense, can be moved closer together.

Further, the user should be able to blacklist a site and have that blacklisting appear in the user's local neighborhood map.

Finally, the user should be able to exchange linkage information with other users so that the information mapping ability of each user in the group is magnified to the mapping ability of the group (in other words, users should have automatic word-of-mouth recommendations about new sites and semantic linkages). Localization, blacklisting, and information sharing all cut down on the cognitive effort of remembering long lists of unrelated things.



last | | to sitemap | | up one level | | next