next up previous
Next: Modeling in related work Up: Real Time User Context Previous: Applying WordSieve in Calvin

Performance


 
Table 1: Comparison of TFIDF and WordSieve
    Num Docs Standard Deviation Mean ANOVA
      TFIDF WordSieve TFIDF WordSieve Diff F Significant

Overall

  381 0.142 0.170 0.145 0.224 +54.48% 91.14 Yes

714mm By User

User 1 64 0.130 0.121 0.142 0.182 +28.17% 6.66 Yes
  User 2 44 0.138 0.137 0.161 0.166 +3.11% 0.04 No
  User 3 44 0.146 0.108 0.132 0.119 -9.85% 0.38 No
  User 4 47 0.100 0.158 0.124 0.177 +42.74% 12.03 Yes
  User 5 87 0.117 0.147 0.117 0.277 +136.75% 117.24 Yes
  User 6 62 0.194 0.225 0.207 0.353 +70.53% 28.31 Yes
  User 7 33 0.138 0.122 0.128 0.210 +64.06% 7.80 Yes

214mm By Query

Genetic 161 0.157 0.191 0.172 0.273 +58.72% 59.20 Yes
  Butterfly 220 0.126 0.142 0.124 0.188 +51.61% 35.76 Yes

514mm Document Length

0-1000 112 0.114 0.209 0.097 0.213 +119.59% 71.30 Yes
  1001-2000 67 0.106 0.142 0.133 0.214 +60.90% 15.51 Yes
  2001-3000 50 0.144 0.154 0.147 0.224 +52.38% 11.62 Yes
  3001-4000 49 0.197 0.165 0.298 0.270 -9.40% 1.20 No
  4001-4999 103 0.108 0.146 0.130 0.221 +70.00% 37.74 Yes

                 
 

To evaluate the performance of WordSieve, we performed an experiment to test its ability to match a document to a hand-coded vector representation of the web search task during which it was consulted. In particular, we wanted to see how closely WordSieve could correlate a document to the original search task given to the user. To test this, seven users (computer science graduate students) were asked to browse the Internet for twenty minutes each while being monitored by Calvin. For the first ten minutes, they were asked to find documents on the WWW that were about ``The use of genetic algorithms in artificial life.'' For the second ten minutes, they were asked to search for information about ``Tropical butterflies in Southeast Asia.'' The users were given no restrictions on how to find the pages. They were only instructed that the documents must be loaded into the web browser provided to them.

User access profiles were developed from this data by passing each set of data through WordSieve three times (in the order originally browsed by the user) to simulate one hour of browsing.[*]

Two term vectors were generated for each document, one using WordSieve, and one using TFIDF. To provide a search task characterization, vectors were created to represent the task description given to the users. The WordSieve and TFIDF vectors from each document were compared to their associated task description.

In our experiments, WordSieve generated indices (i.e. term vectors) for documents which were reliably more strongly correlated to the original task description than those produced by TFIDF (F(1,82)=91.1, repeated-measures ANOVA). This generally held across various subsets of the data. The average TFIDF similarity was 0.145 and the average WordSieve similarity was 0.224. This suggests that WordSieve is performing better at generating profiles that reflect a user's task.

In the experiments presented, the tasks were quite distinct. We have not yet conducted experiments on browsing where the tasks overlap. However, if the tasks overlapped, the keywords which overlapped would be treated both by TFIDF and WordSieve as non-discriminators and would have relatively low values in the term vectors generated, and the non-overlapping keywords would have relatively high values. Because these terms would have similar effects on both algorithms, we expect that performance would be equally affected in both algorithms and that the conclusions of the comparison would be the similar.


next up previous
Next: Modeling in related work Up: Real Time User Context Previous: Applying WordSieve in Calvin
Travis Bauer
2002-01-25