Web Mining Project Blog

Sunday, February 27, 2005

gobbler

(Sort-of) finished a preliminary version of a blog-crawler which we're calling "gobbler". It crawls pages and downloads only links of the following forms:

  • www.livejournal.com/users/somebody
  • somebody.livejournal.com
  • somebody.blogspot.com
Of course links to specific posts &c are automatically stripped down to a link to the original blog. This won't catch those blogs that aren't hosted on standard sites, but we're hoping enough blogs are hosted on the standard sites that this limitation won't be serious.

Now to clean it up a little.

0 Comments:

Post a Comment

<< Home