Gossamer Forum
Home : Products : Gossamer Links : Discussions :

Re: Split the DMOZ data?

Quote Reply
Re: Split the DMOZ data? In reply to
As for the program I use, it's a hack (and a bad one!). It's pretty stupid and really just brute force opens the file, looks for certain strings, and writes out the file. I have to run it multiple times, when really I should be able to run it once, and generate all the cuts, but never had the time to do it.

I do have cuts available, where I've taken the top sections and put them into separate files, with the proper headers. Don't know how important that is, but it works. I try to generate them once a month, so my new sites have a fresher copy of the DMOZ.

Part of the problem for most people is simply the size of the files. They don't have enough disk space, or RAM to deal with them. The smaller "cuts" work better. They work better for general use as well.

As for vi, or any of the other editors, if you can open the file in a "read only" mode, so the program generates lower overhead, you might be able to mark the part you want and save to another file (I know joe and EMACS can do that, I've never gotten the hang of vi).

It would really be a good job for a summer intern (or fall intern) to take the import program, and mix with the parse routines, and be able to pre-parse, take out a cut of DMOZ then import that cut. It would be nice to do it all at once, but doing in several passes (like old style compilers) uses less resources, and allows setting up categories, import locations, and such that would require loads of RAM and linked lists to do in a single pass.

Anyway....

PUGDOGŪ Enterprises, Inc.
FAQ:http://LinkSQL.com/FAQ
Plugins:http://LinkSQL.com/plugin
Subject Author Views Date
Thread Split the DMOZ data? soobe 4997 Aug 26, 2001, 8:57 AM
Post Re: Split the DMOZ data?
Paul 4864 Aug 26, 2001, 9:06 AM
Thread Re: Split the DMOZ data?
Eraser 4855 Aug 26, 2001, 11:35 AM
Thread Re: Split the DMOZ data?
Paul 4919 Aug 26, 2001, 11:57 AM
Thread Re: Split the DMOZ data?
Eraser 4896 Aug 26, 2001, 1:28 PM
Thread Re: Split the DMOZ data?
Paul 4870 Aug 26, 2001, 1:47 PM
Thread Re: Split the DMOZ data?
Eraser 4967 Aug 26, 2001, 1:57 PM
Thread Re: Split the DMOZ data?
Paul 4882 Aug 26, 2001, 3:09 PM
Post Re: Split the DMOZ data?
Eraser 4820 Aug 26, 2001, 3:22 PM
Thread Re: Split the DMOZ data?
Alex 4819 Aug 26, 2001, 10:51 PM
Thread Re: Split the DMOZ data?
pugdog 4818 Aug 27, 2001, 12:16 AM
Thread Re: Split the DMOZ data?
soobe 4824 Aug 27, 2001, 1:14 PM
Thread Re: [soobe] Split the DMOZ data?
fabio 4640 Jul 16, 2002, 1:30 AM
Post Re: [fabio] Split the DMOZ data?
Andy 4692 Jul 17, 2002, 4:11 AM