programming as hinting


For a concrete example, consider spell-checking. This problem is usually assumed to be already well-solved, but there is more to it than simple table-lookup and it's useful to examine the above ideas in a simple and concrete domain first.

The spell-checking problem is this: given an unidentified string, guess what word the user meant to type given the string's textual context, and the user's history of words chosen in varying textual contexts. The information helper must extract information from implicit hints the user provides simply by behaving normally. From extended observation of the user's behavior, the system must then try to deduce what the user currently has in mind.

This is the same problem as deciding what page the user meant to specify when given an incorrect page name (what did the user mean to type?). It is the same problem in image search given multiple image selections (what image is the user trying to specify?). It is the same problem in rule guessing given several example card hands (what rule is the user thinking of?). In all cases, there is a large, possibly infinite, domain of items and the helper's task is to identify which of those possibilities the user has in mind.

The helper's task is not impossible because users are limited in the subset of things they tend to be interested in. Users don't usually type arbitrary strings; they usually work within a relatively small subset of possible strings. Further, their behavior over the past little while provides context which can help determine what they mean. These context cues are both numerous and inadvertent. Further, their abundance is the probable source of widespread user frustration with computers---we see the context easily and can't see why the computer can't too.

The helper's task is to use information from previous behavior to sufficiently bias its own structure so that when the user does something new the helper has some idea what the user meant. In other words, the helper is trying to construct a primitive, and perhaps implicit, dynamic user model.

Such a model must be dynamic because the helper can't remember everything about the user's behavior forever. Each phenomenon must work to bias the helper's structure a little so that the next time it, or something similar, occurs the helper likely has a better idea of what the user meant. However, there is ample opportunity for the helper to develop a hierarchy of memories, stretching over various timespans, where the shortest-term memory carries the most recent, most specific, and most volatile, past, while the longest-term memories are more stable and stretch over longer periods, but aren't very specific. The shortest-term memories work to change what the helper pays attention to, while the longest-term memories work to shape how the helper pays attention.

To solve its problem the helper has to establish a metric on the domain---possibly growing the domain as it sees new elements of the space. This metric gives it some idea of how near and how far elements of the domain are. The relative nearness of two elements tells the system how likely it is that the user said one thing but meant another. Note that this judgment is independent of whether we view the user's actions as "errors to be corrected" (as in a page finder or spell-checker) or as "hints as to ultimate goal" (as in image search or rule guessing) or as "hints to an inarticulable rule" (as in mail or news helpers) or as "hints to a secret rule" (as in game playing).

Suitable metrics on the space can change over time. Further, for any really interesting problem, the helper may need to have several metrics existing simultaneously, some possibly competing, and some at least partly inconsistent. The helper must continually decide which metric should currently have the most say (what is the user's most likely current context?) and whether it should be changed as a consequence of the latest input. For example, in image search, one metric might be how near or far two images are in overall redness. The system might need to know this to determine if the images so far selected by the user are increasing in redness value.

If there are multiple metrics, how can the helper tell which ones are most important at any given time? To do so it can only rely on context. If in the past, the user's behavior has correlated a certain metric with a certain action (or direction or goal or rule or string or decision, or whatever else the helper is trying to predict) then in the future, if that context is again recognized, the system should weight that particular metric higher than normal. In other words, metrics should be context-switchable.

Here's a example: in the context of C programming, "i" and "j" are quite close because "i" and "j" are often used to control loops. But in the domain of word recognition, "i" and "y" might be much closer than "i" and "j" because their sounds are closer, so the user might be more likely to slip between one and the other. In such cases, the user is less likely to slip between "i" and "j" than between "i" and "y". Similarly, in the domain of spell-checking "i" and "u" are closer on the keyboard than "i" and "y" so the user is more likely to type one for the other. Each context needs a different metric between letters. Each of the actors in the system---that is, each subsystem implementing a particular metric---should have one of these as its most probable metric.

Or consider page finding, where, depending on what directory a user is in, that user could be interested in different sets of pages. For example, someone in a "courses" subdirectory, is probably doing course-related things. That user is more likely to be interested in the subset of pages that have to do with courses---and those pages might happen to be scattered around the directory structure. That subset of pages is not going to be the same as the pages and directories the user is most likely to reference when the user is in the "books" subdirectory. The helper can notice correlations of this kind if it watches the user's behavior over many sessions. It's going to see that when the user is in one directory the user is more likely to reference a certain subset of pages and when in another the user is more likely to reference another subset. Of course, the user could always reference any arbitrary page but the likelihood of that is small.

This approach to spell-checking (or page finding and so on) turns the programming task from one of finding fixed, deterministic, pre-specified actions to a more flexible one biased by the user's actions. The users actions work to reinforce various hints (actors within the spell-checker) and they in turn work to influence the choices the spell-checker as a whole makes.



last | | to sitemap | | up one level | | next