These simpletons roam information spaces (the web, ftp sites, the desktop, mail and news servers, and so on). trying to fetch information that might interest the user. The simpletons determine what would interest the user from their specification pages. These pages are written and constantly modified by the user model to reflect the user's preferences when using the system.
Collector simpletons specialize on a specific information space.
Thus, web simpletons would be concerned
with looking for interesting things on the web.
Email simpletons would exclusively examine the user's mail.
News simpletons might constantly poll a news server to
check for information the user might find interesting, and so on.
Examiner simpletons analyze the incoming data and associate with it a set of attributes that describe the data. Each examiner is concerned only with grading a specific attribute. They look for it, examine it, write their comments, and give points based on their evaluation of it.
Each examiner simpleton would get information from the data pool it is reading
from (pages from the input pool could be accessed in a random fashion,
or maybe in the order in which they were fetched by the collectors). Once
the simpleton marks up the page (that is, fills in the attribute
values it knows how to calculate), it deposits the page in its output data
pool, which is the input pool for the next examiner simpleton. This simpleton in
turn adds some more attributes. In this manner, when the last simpleton in
the chain of examiner simpletons finishes marking up the page, we have a page
with all the information the "downstream" simpletons
need in the form of (attribute, value) pairs.
There will be one clusterer simpleton for each cluster in the system. Every new
page that comes into the system is examined by each clusterer simpleton to
determine how similar it is to the pages in its present cluster. Only if it
is found to be sufficiently similar is it added to its cluster. Thus,
one page could potentially be added to multiple clusters if multiple
clusterer simpletons found it similar. Pages rejected by all clusterer
simpletons could accumulate for example, in a separate miscellaneous
cluster. Pages in this miscellaneous cluster would be examined by
filters later on to determine whether they should be retained in the
system. The cluster managers could then create a new cluster for these
pages if needed (what is the exact mechanism to handle the
miscellaneous cluster? Is it a good idea to let the cluster manager
deal with it after the filters have marked those pages?).
Clusters would have to be managed and updated to reflect changes
in information and the relationship between them. Hence, cluster
merger simpletons would decide whether two clusters are similar enough to be
merged and cluster splitter simpletons would decide if one cluster
contained information dissimilar enough to go into separate clusters.
Cluster creator simpletons would decide if a new cluster needed to be
created and cluster killer simpletons would decide if a cluster should
The filters compare new pages against particular lighthouses. Each filter simpleton is concerned with guessing how much they think the user might like a certain page. The idea here is that each filter is an expert on a certain kind of information. One filter, for example, might be an expert in identifying spam. This simpleton would look for spam email, news postings, and so on, and add a comment saying that it really thinks this page should not be seen by the user. Another filter might be an expert in identifying pages dealing with a known interest area and would strongly vote to include the page.
Each filter might "express its opinion"
by adding an attribute with its identifier and a likeness value (say,
a real number between 0 and 1 indicating how much the filter liked the
page). Filters use the attribute information gathered by the examiners to
make their decisions. They are also influenced by the filter specifications,
which are written by the user model.
Note that if a filter simpleton rejects a page outright,
it is still written to its output pool (that is, it is not
"deleted" by preventing its flow to the next simpleton).
The mapper simpletons map each page to an (x, y, z) coordinate on a virtual screen. This position should reflect the page's content---other pages similar in content should be nearby. There should be some empty space between clusters. Note that this (x, y, z) coordinate does not necessarily (indeed, it most likely will not) correspond to the actual position on the interface window the visual page representation will be drawn. It is just information that gives the display an idea of how close by the pages should be to each other.
Mappers convert a space of k dimensions
(k being the number of attributes) to a space of 3 dimensions (the x, y, and z
coordinates). The cluster information needs to be used as well.
Pollster simpletons gather data about the pages.
This data deals with the user's interactions with the system---the pages
that have been visited, the times spent on these pages, and so on.
The pollsters use this information to identify lighthouses.
These are pages that
the user visits frequently or has explicitly indicated a liking for.
Lighthouses can be used to gauge the usefulness of other pages.
They can also potentially
be used as centers around which clusters are formed, or fixed places around
which the other pages are arranged by the mappers. Pollsters also give
their assessment of how good a page is based on feedback from user interaction.
Purgers roam the space of pages trying to gather evidence of the uselessness of a page. Each purger examines a page and looks for bad attributes---like a rejection of the page by all filters, for example. Another purger might be looking for pages which were accepted but which have registered little or no user activity (such a page would have a poor score from one or more pollster simpletons). Each time a purger encounters such a page, it does not write the page to its output pool. All other pages are written to their output pools. This is how pages get "deleted".