Gossamer Forum
Home : Products : Links 2.0 : Discussions :

DMOZ Integration II

Quote Reply
DMOZ Integration II
That other thread is just getting too big to make any sense of ....

IMO, I think you (and my hyperseek users) would be better served to run DMOZ conversions from a common, consistant location, rather than for 100's of you to have to download and convert the entire mess every time it changes. I started something like this at hyperseek.com/dmoz.shtml, but due to my time constraints, i couldn't keep up with the constant reloading of the dmoz exports.

If anyone wants to volunteer the space and time, I'll throw the programming at you, and get you started with loading and maintaining it. That way, anyone can go there, select what they want, and download it as needed, always from an updated database, without having to mess with those huge .rdf's

If you're interested, let me know, and I'll work with you on it.

If a bunch of you reply to this thread, then I'll let you hash it out yourselves who's going to "be the man" here. What you'll need is a web server with a ****load of disk space, mysql, and the perl DBI/DBD modules. Actually, having access to the NET::FTP module would be very helpful, as I can programatically have the system update itself for you.

Anyway, you guys run with the idea, let me know and I'll jump in with whomever will be the "dmoz-master" and get it all started. I've got some good ideas on ways to improve what I'd started earlier, and I'm sure you guys can come up with even more....

John
Quote Reply
Re: DMOZ Integration II In reply to
Hi everyone. This is my first post, even though I have been folowing the thread very closely. I, like most of you, want to start my "own" directory.

Anyway, jcokos, even though I am not by any means a whiz at perl or really willing to do any programming (mainly because this is way past my ability level) on the project you talked about above, I may be able to provide the server space. I have a high-volume hosting account that may help.

Maybe we can work out a deal. I have been checking out hyperseek.com and I have to tell you it's one of the most impressive scripts, or I should say combination of scripts, I have ever seen. Of course you must understand I am a full-time student and streach every dollar as far as I can. I'm even using FreeI (freei.net) because I have to pay the hosting company. As you may have guessed -- I would love to have my own copy of hyperseek.

...So, how much space/bandwidth is needed?
Quote Reply
Re: DMOZ Integration II In reply to
OK, I sent an E-mail.

Approx. how much space/bandwidth is needed?

[This message has been edited by trd (edited December 27, 1999).]
Quote Reply
Re: DMOZ Integration II In reply to
This is not the proper medium for us to discuss the actual directory software or deals (you can email me privately). The goal here is to find a home for a common access point for the use/maintenance of a useable DMOZ dump...

John
Quote Reply
Re: DMOZ Integration II In reply to
Server Specs needed for hosting the DMOZ Downloader:

Unix/Linux
mySQL Database
Perl 5
Perl DBI/DBD::mysql

As much memory as you can afford (Can never have enough)

For disk space, the raw .rdf files take about 300-500 MB, as do the converted text files (So, you'll need about 1GB to handle all of the raw data).

When loaded to a mysql database, figure on another 1-2 GB to hold it all in mysql's data and index format.

So, minimum, you'll need about 3 GB of disk space available, the more the better.

As far as bandwidth goes, you'll want to have a lot available, as I can see this thing getting to be pretty popular.

John
Quote Reply
Re: DMOZ Integration II In reply to
it looks like you need a dedicated server just for it.
someone mentioned in the other thread that the server crashed when he tried to convert the files and the server admin was not too happy about it.
to do the conversion for your own, do u need mySQL?
jcokos, i tried your conversion utility but it's very slow and it only converted very few links comparing to the links found at dmoz
Quote Reply
Re: DMOZ Integration II In reply to
The reason for the crash of the other guy's server is basic and simple...

Those .rdf files are simply huge. It takes a buttload of memory and disk space to do a full conversion.

The converter doesn't use SQL at all, it just slurps in the .rdf and spits it out pipe delimted...

Yes, it almost seems like a dedicated server is required to run this thing properly. There are people out there, though, that have access to such things, and that's what we're looking for here...

John