Simpleton Roles

Collectors

These simpletons roam information spaces (the web, ftp sites, the desktop, mail and news servers, and so on). trying to fetch information that might interest the user. The simpletons determine what would interest the user from their specification pages. These pages are written and constantly modified by the user model to reflect the user's preferences when using the system.

Collector simpletons specialize on a specific information space. Thus, web simpletons would be concerned with looking for interesting things on the web. Email simpletons would exclusively examine the user's mail. News simpletons might constantly poll a news server to check for information the user might find interesting, and so on.

Examiners

Examiner simpletons analyze the incoming data and associate with it a set of attributes that describe the data. Each examiner is concerned only with grading a specific attribute. They look for it, examine it, write their comments, and give points based on their evaluation of it.

Each examiner simpleton would get information from the data pool it is reading from (pages from the input pool could be accessed in a random fashion, or maybe in the order in which they were fetched by the collectors). Once the simpleton marks up the page (that is, fills in the attribute values it knows how to calculate), it deposits the page in its output data pool, which is the input pool for the next examiner simpleton. This simpleton in turn adds some more attributes. In this manner, when the last simpleton in the chain of examiner simpletons finishes marking up the page, we have a page with all the information the "downstream" simpletons need in the form of (attribute, value) pairs.

Clusterers

There will be one clusterer simpleton for each cluster in the system. Every new page that comes into the system is examined by each clusterer simpleton to determine how similar it is to the pages in its present cluster. Only if it is found to be sufficiently similar is it added to its cluster. Thus, one page could potentially be added to multiple clusters if multiple clusterer simpletons found it similar. Pages rejected by all clusterer simpletons could accumulate for example, in a separate miscellaneous cluster. Pages in this miscellaneous cluster would be examined by filters later on to determine whether they should be retained in the system. The cluster managers could then create a new cluster for these pages if needed (what is the exact mechanism to handle the miscellaneous cluster? Is it a good idea to let the cluster manager deal with it after the filters have marked those pages?).

Cluster Managers

Clusters would have to be managed and updated to reflect changes in information and the relationship between them. Hence, cluster merger simpletons would decide whether two clusters are similar enough to be merged and cluster splitter simpletons would decide if one cluster contained information dissimilar enough to go into separate clusters. Cluster creator simpletons would decide if a new cluster needed to be created and cluster killer simpletons would decide if a cluster should die.

Filters

The filters compare new pages against particular lighthouses. Each filter simpleton is concerned with guessing how much they think the user might like a certain page. The idea here is that each filter is an expert on a certain kind of information. One filter, for example, might be an expert in identifying spam. This simpleton would look for spam email, news postings, and so on, and add a comment saying that it really thinks this page should not be seen by the user. Another filter might be an expert in identifying pages dealing with a known interest area and would strongly vote to include the page.

Each filter might "express its opinion" by adding an attribute with its identifier and a likeness value (say, a real number between 0 and 1 indicating how much the filter liked the page). Filters use the attribute information gathered by the examiners to make their decisions. They are also influenced by the filter specifications, which are written by the user model. Note that if a filter simpleton rejects a page outright, it is still written to its output pool (that is, it is not "deleted" by preventing its flow to the next simpleton).

Mappers

The mapper simpletons map each page to an (x, y, z) coordinate on a virtual screen. This position should reflect the page's content---other pages similar in content should be nearby. There should be some empty space between clusters. Note that this (x, y, z) coordinate does not necessarily (indeed, it most likely will not) correspond to the actual position on the interface window the visual page representation will be drawn. It is just information that gives the display an idea of how close by the pages should be to each other.

Mappers convert a space of k dimensions (k being the number of attributes) to a space of 3 dimensions (the x, y, and z coordinates). The cluster information needs to be used as well.

Pollsters

Pollster simpletons gather data about the pages. This data deals with the user's interactions with the system---the pages that have been visited, the times spent on these pages, and so on. The pollsters use this information to identify lighthouses. These are pages that the user visits frequently or has explicitly indicated a liking for. Lighthouses can be used to gauge the usefulness of other pages. They can also potentially be used as centers around which clusters are formed, or fixed places around which the other pages are arranged by the mappers. Pollsters also give their assessment of how good a page is based on feedback from user interaction.

Purgers

Purgers roam the space of pages trying to gather evidence of the uselessness of a page. Each purger examines a page and looks for bad attributes---like a rejection of the page by all filters, for example. Another purger might be looking for pages which were accepted but which have registered little or no user activity (such a page would have a poor score from one or more pollster simpletons). Each time a purger encounters such a page, it does not write the page to its output pool. All other pages are written to their output pools. This is how pages get "deleted".