Gossamer Forum
Home : Products : Gossamer Links : Discussions :

Novice DMOZ question

Quote Reply
Novice DMOZ question
Hi, I have read through all the DMOZ posts and there is one thing I don't understand (probably because I'm new to this stuff :))
I noticed that the nph-import.cgi command includes the section --source="/PATH/TO/content.rdf.u8.gz
Does that mean I first need to have content.rdf.u8.gz on my server, or can import directly from the dmoz.org site (I need relatively few categories with less than 40,000 links).

If I have to first put the file on my (virtual) server, then there are two additional question:

1. How do I transfer the content.rdf file to the server without first downloading it to my PC and then uploading to the server (I'm on dial-up so it would take many hours to transfer through the PC).

2. Will the parsing procedure unzip the whole content.rdf file or will it uncomprese just the relevant categories (I only need 25,000-35,000 of the links on dmoz). Uncompresing the whole file will bring me over my disk quota.

Thanks, any help will be appreciated

- Didi

Quote Reply
Re: Novice DMOZ question In reply to
Telnet> wget http://www.dmoz.org/rdf/content.rdf.u8.gz

You can import using gzip which will keep the file at 130MB but the other method requires it to be unzipped to 700MB

Look at the thread just below this for the commands to use when running nph-import.cgi

Paul Wilson.
http://www.wiredon.net/gt/
http://www.perlmad.com/
Quote Reply
Re: Novice DMOZ question In reply to
Dear Paul,

Thanks for the advice.
When I'm at the telnet command (telnet>) and type "wget http://www.dmoz.org/rdf/content.rdf.u8.gz" I get "?Invalid command".

Any advice?

Thanks,

Didi

Quote Reply
Re: Novice DMOZ question In reply to
Mishpat,

If you are running the out of the Box Redhat Installation WGET is not included. I would suggesting loading it from the support site of your loaded Linux or get it through CPAN

Drew

<hr>
Drew Selman
Earthnet Communications
http://www.i-cram.com
http://www.mcpzone.com
Quote Reply
Re: Novice DMOZ question In reply to
The problem is I'm using virtual hosting, so I can't just install WGET :)
Is there any way to import the file without WGET?

Thanks,

Didi

Quote Reply
Re: Novice DMOZ question In reply to
I would take one of 2 tacks:

1) Ask your virtual host to install wget for you. Probably won't happen since it may be a security breach. Give it a twirl, you never know. Or if you have a spare box or a linux dual boot, wget the file and then ftp it to your host.

2) Try and save the file from the web. The problem here is that some browser want to interpret the gz and uncompress it as it goes. I tried for the hell of it and IE 5 kept going well past the 131mb its listed at.

Drew Selman
Earthnet Communications
http://www.i-cram.com
http://www.mcpzone.com
Quote Reply
Re: Novice DMOZ question In reply to
You mentioned you can import using GZIP. Can anyone give the command you need to import using this method. I only need to import a small subcategory.

I have the compressed file on my server. I could decommpress it but that would take me way over my disk quota so I wouldn't be able to do anything else!!

Quote Reply
Re: Novice DMOZ question In reply to
See:

http://www.gossamer-threads.com/...ew=&sb=&vc=1

Paul
Installations:http://wiredon.net/gt/
Support: http://wiredon.net/forum/