personalized information maps


The purpose of intelligent information processing in general seems to be creation of simplified images of the observable world at various levels of abstraction, in relation to a particular subset of received data.
Teuvo Kohonen, Self-Organizing Maps

Trying to find information on the web is like trying to find something at a huge jumble sale: it's fun, and you can make serendipitous discoveries, but for directed search it's better to go to a department store; there, someone has already done much of the arranging for you. Unfortunately, the web's growth, diversity, and volatility, make human indexing impossible.

The Library of Congress, one of the world's most comprehensive collection of human knowledge, holds 112 million items (17 million books, 95 million maps, manuscripts, photos, films, tapes, paintings, prints, drawings, and other items) stretching over 532 miles of shelves. As of May 1998, however, the AltaVista search engine indexed over 140 million pages, which at that time was probably only around a third of the entire web (The New York Times, April 9, 1998, estimated the total then as 320 million pages).

Further, the Library of Congress collection is only growing by 7,000 items every working day; the web is growing by better than 1,000 pages a minute (Wired, July 1998, page 59). The number of pages should cross a billion well before January 1st, 2000. And many of those pages are constantly changing---and constantly moving.

The information overload problem isn't restricted to the web---the desktop itself is rapidly approaching the breaking point as well. As of November 24th 1998, an 18 gigabyte disk drive costs $350, or about $20 a gigabyte. Since a 500-page textbook takes up about 1/2 megabyte (compressed), $1,600 buys storage for the text of about 1 million books. Next year it will buy space for at least 2 million books. The following year, 4 million. The Library of Congress could be sitting on your desk by 2002.

Further, most text pages average only between 3 and 4 kilobytes, so $1,600 buys storage for 20 million text pages. Today's operating systems, however, were designed in an era when managing a few hundred pages was all that was required. So today's users are given the hardware to store millions of pages and the software to manage only a few hundred.

As the web grows it is becoming easier and easier to become lost in it. To avoid that it seems necessary that the web first be mapped, which is probably impossible. Even if it could be done, any such map would not be of great use to all users since it would have to be very general. And of course, it would be always out of date.

It is possible, however, to map relevant portions of the web incrementally by starting with a nucleus of pages that a particular user has already demonstrated interest in, then branching out from there. Each new page can then be placed relative to the other pages in a two- or three-dimensional space of pages, thereby aiding search, organization, and recall. The same thing can be done for the desktop itself. In both cases, the key is to focus on the interests of each single user, analyze pages to find out what relates them, and to map pages to a user-navigable space.

Within this website, "pages" refer to webpages, documents, executables, images, audio or video clips, symbolic links, or directories---any digital data at all.



last | | to sitemap | | up one level| | next