my stuff

what's new with me

Sunday, July 24, 2005

Berkeley DB is my mortal enemy

I wasted a lot of time today on the following little snippet of Perl code:

if( not defined $dict{$node1} ) {
$dict{$node1} = $current_id;
$current_id++;
}

die "Broken" if not defined $dict{$node1};


Essentially, I was going through a file full of strings and assigning each string a unique ID number. I was storing the correspondences between the ID numbers and strings in the dictionary %dict, which I had tied to an oh-so-handy Berkeley DB earlier in the code..

tie( %dict, 'DB_File', $dict_filename, 
O_RDWR|O_CREAT, 0640, $DB_BTREE ) or die;


We're talking something on the order of 5 million unique strings, but no problem; I'm using a btree, so there should be plenty of room for them all. But then I ran into the problem above - I define $dict{$node1}, but then only a few minutes later it becomes magically undefined. The snippet above would die at a certain point, which should be impossible. What?

The problem turned out to be that the key, $node1, was too long (it could be up to around 2k bytes in length). Even if I used the alternate "method call" way to store the key, the status still wouldn't reflect an error, i.e.
$x = tie( .... );
$status = $x->put( $node1, $current_id );


would leave $status with 0 (success!) while silently failing to do anything. It wasn't until I would retrieve a key later with $x->get(..) that the Berkeley DB would realize that something wasn't right. You're giving me an undefined value? I think I can figure out for myself that something's wrong, thanks.

Anyway, it boiled down to increasing the page size of the database to 8k. It's horribly slow now, but at least it works. Good thing I only have to do this calculation once (hopefully).

1 Comments:

  • At 9:58 PM, Anonymous said…

    So a ordinary perl hash wouldn't do the job?
    -Apple

     

Post a Comment

<< Home