Gossamer Forum
Home : General : Perl Programming :

Advice with DB structure

(Page 1 of 2)
> >
Quote Reply
Advice with DB structure
I wonder if anyone could give me some advice on the following...

I'm creating a program (using perl due to it's excellent pattern matching capabilities) to look up words in a dictionary and return matches that rhyme with the word being searched on.

This is very much over simplyfied, but never mind.

What I need to know is where to get an entire dictionary from. Where did GTs get theirs from for their Spell check program? I take it some are freely available or did you sit there and type in every word in the dictionary into a database?

And... what is the best way to store this information? A table for each letter and all words beginning with that letter as a row inside? Or would you do colums? Flat files?

How does other spell check modules handle this? I'm guessing it's the same set up as I would need for this project.

Cheers

Wil

- wil
Quote Reply
Re: [Wil] Advice with DB structure In reply to
I take it many scripts take advantage of the *ix program:

/usr/dict/words

?

- wil
Quote Reply
Re: [Wil] Advice with DB structure In reply to
Well I guess so seeing as it is full of ...ooo...words
Quote Reply
Re: [PaulW] Advice with DB structure In reply to
Yes, but what do you do then? Benchmarking it. I can slurp in all the words into a hash that occupies just under 10MB of memory on my linux box.

That's not bad. But not brilliant if you're running a busy system.

- wil
Quote Reply
Re: [Wil] Advice with DB structure In reply to
Ugh, then that's probably a good reason not to 'slurp' it into a hash.

You probably need a while loop and/or grep and 'last' is VITAL if you do it that way.
Quote Reply
Re: [PaulW] Advice with DB structure In reply to
Throwing all words into a hash and then run my regexes off there is the fastest way, surely?

You wouldn't want to open a pipe and use a while on a *ix program?

- wil
Quote Reply
Re: [Wil] Advice with DB structure In reply to
The spellcheck module comes with a dictionary file that is basically a b-tree. It has the dictionary (which yes came from linux), and given a word, it opens a file and does a scan of the file using read/seek. It takes at most 15 seeks of a file to lookup a word, and doesn't load the whole thing into memory, so it's quite fast.

It also comes with a soundex file that is in a similiar format, but key'd off the word's soundex. It looks up the soundex to find a list of similiar words, and then returns the best matches first.

Cheers,

Alex
--
Gossamer Threads Inc.
Quote Reply
Re: [Wil] Advice with DB structure In reply to
>>Throwing all words into a hash and then run my regexes off there is the fastest way, surely?
<<

Absolutely not, thats probably the slowest way. Did you forget the conversation we had the other day when you yourself said large hashes are very slow?

>>You wouldn't want to open a pipe and use a while on a *ix program?
<<

It isn't a unix program its a list of words Tongue

http://www.perlmad.com/words
Quote Reply
Re: [PaulW] Advice with DB structure In reply to
But hashes are the fastest perl data structue.

- wil
Quote Reply
Re: [Alex] Advice with DB structure In reply to
Thanks for the advice, Alex.

I've got a problem now that I need the dictionary in Welsh. Well, the problem is locating one. Once I've done that I will take your suggestions on board on how to implement this.

Thanks.

- wil
Post deleted by PaulW In reply to

Last edited by:

PaulW: Nov 30, 2001, 4:53 AM
Quote Reply
Re: [Wil] Advice with DB structure In reply to
Actually hashes use up much more memory.
Quote Reply
Re: [PaulW] Advice with DB structure In reply to
Naturally, because they store more information?

- wil
Quote Reply
Re: [Wil] Advice with DB structure In reply to
Eh? lol

How do you work that out?

The amount of information they store depends on your script.

Which stores more info?

$foo{1} = 'Hi'
$foo[1] = 'Hi'

?
Quote Reply
Re: [PaulW] Advice with DB structure In reply to
No hang on. Hash tables are faster because they don't index your data. With arrays it's all in order therefore you can pop and push, but with hash tables your data is stored randomely on memory therefore they take up less memory?

- wil
Quote Reply
Re: [Wil] Advice with DB structure In reply to
Um nooooo. Hashes use up more memory. It's a fact. You can argue with the perldocs if you like but I wouldn't recommend it Sly
Quote Reply
Re: [PaulW] Advice with DB structure In reply to
That's not what it states in the Camel. Hmm.

- wil
Quote Reply
Re: [Wil] Advice with DB structure In reply to
Someone is lying Tongue
Quote Reply
Re: [Wil] Advice with DB structure In reply to
OK. Sorry. My head is spinning.

An array of n elements uses less memory than a hash of n key-value pairs.

That statement makes logical sense. However the camel urges you to use hashes, and even states that "if you're not using hashes, then you're not even thinking in perl".

Because... even though hashes take up more memory they are a million times faster to access, which makes them far more efficent if you have a large hash vs. large array.

- wil

Last edited by:

Wil: Nov 30, 2001, 6:12 AM
Quote Reply
Re: [Wil] Advice with DB structure In reply to
A hash is only an associative array anyway Angelic

Anyway I agree, hashes are much more flexible, especially when using something like a template parser that requires names and not numbers.

Code:
$foo = {

Key1 => [
{ SubKey1 => { SubKey2 => [ { Bar => 'Hello' } ] } },
],

};


Now let's see if you can print Hello Laugh
Quote Reply
Re: [PaulW] Advice with DB structure In reply to
Well they are simply much much much faster to access than arrays. I can't put this into English so I'll come up with a code example for you. Hang on...

- wil
Quote Reply
Re: [PaulW] Advice with DB structure In reply to
In Reply To:
Now let's see if you can print Hello Laugh

Code:
$foo{SubKey1}{SubKey2}{Bar}

Edit: Obviously I'd put a print statement in front of that <g>.

- wil

Last edited by:

Wil: Nov 30, 2001, 7:04 AM
Quote Reply
Re: [Wil] Advice with DB structure In reply to
>>$foo{SubKey1}{SubKey2}{Bar}
<<<

You are miles off :)

Try:

print $foo->{Key1}->[0]->{SubKey1}->{SubKey2}->[0]->{Bar};

You can shorten it to:

print $foo->{Key1}[0]{SubKey1}{SubKey2}[0]{Bar};

...but I prefer using ->

Angelic

Last edited by:

PaulW: Nov 30, 2001, 7:31 AM
Quote Reply
Re: [PaulW] Advice with DB structure In reply to
But my way works - why code more than neccessary?

- wil
Quote Reply
Re: [Wil] Advice with DB structure In reply to
LOL no your code doesn't work.

For starters you've totally missed the main key - Key1...and the array references.

The second way I provided is the shortest possible way. (i think)

Last edited by:

PaulW: Nov 30, 2001, 8:05 AM
> >