Gossamer Forum
Home : General : Perl Programming :

DMOZ content.rdf.u8 file...

Quote Reply
DMOZ content.rdf.u8 file...
I'm just wondering if anyone else is having the same problem as me. For some reason the .tar.gz file does not like de-compressing properly now. I'm getting content.rdf.u8.gz from dmoz.org/rdf/. The error I am getting is;

gzip: content.rdf.u8.gz: unexpected end of file

I'm using wget to grab the file, and that seems to grab it from their server ok. I'm just wondering is anyone else is having the same problem Tongue

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [A.J.] DMOZ content.rdf.u8 file... In reply to
you mean the copy you get is still compressed when finished downloading? It has always uncompressed on its own during download for me for win9.x/me/2k and NS/IE. I haven't tried Linux yet.

Also there's a warning on DMOZ that recent dumps hvae been corrupted and what not. I started downloading the dump at 2:15 this afternoon and I'm just now at 650MB. Sheesh... just got cable and I'm already dissatisfied with my connection speed. I've only broke 220kbs twice and I'm supposed to be at 384 (cheapest plan available).

--Philip
Links 2.0 moderator
Quote Reply
Re: [King Junko II] DMOZ content.rdf.u8 file... In reply to
ah, with UNIX you just log in with SSH and do;

#1 Type; wget http://dmoz.org/rdf/content.rdf.u8.gz
#2 Once that is downloaded (about 150mb), then decompress with gzip -d content.rdf.u8.gz
#3 Then you should be left with a 950Mb(ish) file called content.rdf.u8.

However, at stage 2 this is where I am gertting the problem above. I also noticed that warning on their site about possible problems, but I thought that was only related to having duplitate categories?

Anyway, thanks, and if anyone has any luck getting it working, please let me know Smile

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [A.J.] DMOZ content.rdf.u8 file... In reply to
Hm... sounds like it's time to install Linux again on my old hard drive. anyway, I got the sucker completely downloaded and split up my main category today. I'm feeding each file through a small program that imports the link into an SQL database. so far I've got over 2 million records in the database and damn is it slow, even with indexes!

--Philip
Links 2.0 moderator
Quote Reply
Re: [King Junko II] DMOZ content.rdf.u8 file... In reply to
Ah, its ok now. Not sure if it was the file getting corrupt on download, or the stability of the file I was downloading.

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!