next up previous
Next: Tradeoffs Up: A Research Agent Architecture Previous: Replaying Data Streams


Table 1: Primitive operations defined on Resources
\begin{tabularx}{\linewidth}{\vert l\vert X\vert}...
... the type of resource this is.\\ \hline

Most of Calvin is written in Java, allowing it to be developed and tested in Windows, Unix, and Linux. The use of inheritance among the classes allows various components to be extended and recombined for different research tasks. The goal is to maximize the ability to reuse components so that when the research goals shift slightly, obsolete components can be deleted without requiring extensive rewrite of other components. This is achieved in part by abstracting both resource types and data analysis components. By defining abstract operations over resources, the specific kinds of resources used in Calvin can change without requiring Calvin itself to change. The primitive operations defined for a resource are shown in table 1. As described previously, new resource types are added to Calvin by writing a class which implements these operations for the new resource type. This class is added to the registry, effectively integrating the new type into Calvin without changing Calvin itself.

Table 2: Primitives operations defined on contexts. As described in the text, ``Context'' in this table refers to a stream of data. A data analysis component processes the data stream to learn context-based indexing features for the documents the user accesses.

In order to abstract the data analysis component, we assume that every implementation of a data analysis component analyzes a stream of data, which we call a context. A data analysis component can then be defined abstractly as a set of operations over a context. See table 2 for the set of operations. By building classes that implement these operations, all the components of Calvin can use multiple, diverse versions without modification.

The only part of Calvin that is not written in Java is the Postgresql database used to store data for later retrieval and analysis. Postgresql is an open source object relational SQL database server. Calvin components communicate with it through a JDBC driver over a TCP/IP connection. Although this reduces Calvin's ability to be deployed on a wide scale for individual use, the SQL database makes it much easier to analyze and store large amounts of data. Data in Postgresql can be retrieved one of any number of ways. Postgresql's ODBC driver lets one import it into Microsoft Access or SPSS. The JDBC driver lets one access it via Java. Or one can use the command line sql utility included with Postgresql to access it directly.

Figure 4: Postgresql Text Interface

Our infrastructure also includes utilities that process the data collected by Calvin to show how our information retrieval algorithm would have performed had the users been using an agent with some given data analysis component. Figure 4 shows the results of one such experiment, in which the average performance of the term frequency/inverse document frequency indexing algorithm compared to our current data analysis component. In this example, the data is grouped by user. But because of the infrastructure built into Calvin, with this data stored in the database, one could just as easily group by document, document length, or other attribute to perform any number of analyses.

next up previous
Next: Tradeoffs Up: A Research Agent Architecture Previous: Replaying Data Streams
Travis Bauer