Gossamer Forum
Home : Products : Gossamer Links : Discussions :

DMOZ Dumps.....

Quote Reply
DMOZ Dumps.....
Hello,

I am in the middle of dumping all DMOZ categories for people to import into their own databases. This will mean you DON'T have to spend hours or even days imporing the data from the 700MB rdf file.

The dumps are gzipped and are of varying size depending obviously on the category (ie Business is larger than Adult).

To give you an idea, I just gzipped 200,000 links at compression 9 and the resulting file was only 13MB. Before gzipping it was 70MB.

The dumps can be imported using MySQLMan or phpMyAdmin and should only take a few minutes to import.

The dumps aren't free because they are taking a huge amount of time to do, for example I am doing the Regional category at the moment and it has taken 3 days so far and is not yet finished.

To obtain the gzipped file you will need to login to your telnet account and issue the following command:

wget ftp://ftp.myurl.com/DMOZ_Dumps/categoryname.tar.gz
(You will be given the proper URL after paying)

This will put the dump on your server. You then need to unzip it....

Please reply to this thread if you are interested or email pwilson@wiredon.net, or send me a pm.

Thanks.

PS. Currently I only have the earlier categories plus Regional but I am working my way through them but it will be a week or two before they are all done.

Paul Wilson.
http://www.wiredon.net/gt/
http://www.perlmad.com/
Quote Reply
Re: DMOZ Dumps..... In reply to
Update:

I have the music category done and the 600,000 regional category will be done in a few days.

Im working my way through them all.....

Installations:http://www.wiredon.net/gt/
Support: http://www.wiredon.net/forum/

Quote Reply
Re: DMOZ Dumps..... In reply to
Is no-one interested?

I have 150,000 links from Regional imported so far so it should be finished by tommorrow and I will be gzipping it up.........

Remember the music category is done and tommorrow or the day after I will have Adult, Arts etc.......

Paul
Installations:http://wiredon.net/gt/
Support: http://wiredon.net/forum/

Quote Reply
Re: DMOZ Dumps..... In reply to
I'm interested, but I have to look over the DMOZ directory more. There are so many links in there, I'm afraid even importing one category may make my database huge.

How many links can LinksSQL handle with ease? Ten thousand and less, or more than ten thousand? I really don't want to create a giant database since that defeats the purpose of my specialized links site.

Anyways, I thought I'd let you know my thinking....

Bryan

Quote Reply
Re: DMOZ Dumps..... In reply to
Hi,

Oh yes, Links SQL can cope with plenty more than that.....with a decent server with a good amount of RAM etc....you could store 1,000,000 (one million ) links.

10,000 links is fairly small for a Links SQL database, or for what Links SQL can cope with.

I expect that the Regional category will require around 200MB of space but that it because it is one of the biggest categories with 600,000 links in it.

Paul
Installations:http://wiredon.net/gt/
Support: http://wiredon.net/forum/

Quote Reply
Re: DMOZ Dumps..... In reply to
Paul said, "I expect that the Regional category will require around 200MB of space but that it because it is one of the biggest categories with 600,000 links in it."

Is 200MB the size of the database to be imported or is it the expected space required to build static 600,000 links and their associated pages?

Quote Reply
Re: DMOZ Dumps..... In reply to
I think if I remember rightly, 400,000 links was about 85MB gzipped so 200MB will be for the database. You will need more on top of that for the pages but I can't give you an exact figure obviously.

Paul
Installations:http://wiredon.net/gt/
Support: http://wiredon.net/forum/

Quote Reply
Re: DMOZ Dumps..... In reply to
Done so far.....

Adult: 2.5MB (16MB) gzipped
Arts: 12.5MB (72MB) gzipped
Arts/Music: 2.8MB (15.5MB) gzipped
Business: 10MB (60MB) gzipped

Paul
Installations:http://wiredon.net/gt/
Support: http://wiredon.net/forum/

Quote Reply
Re: DMOZ Dumps..... In reply to
OK I know there are quite a few interested in this one.......

Regional is now DONE!

202MB unzipped.
37MB gzipped.

Contact support@wiredon.net





Paul
Installations:http://wiredon.net/gt/
Support: http://wiredon.net/forum/

Quote Reply
Re: DMOZ Dumps..... In reply to
What is the date for your dmoz dumps. I broke all the areas down about 3 months ago.

Can't never could do nothing till he whupped old couldn't till he could...
Quote Reply
Re: DMOZ Dumps..... In reply to
Date?

Paul
Installations:http://wiredon.net/gt/
Support: http://wiredon.net/forum/

Quote Reply
Re: DMOZ Dumps..... In reply to
What I mean by that is how old is the dump, since I downloaded mine, there has been an increase of about 16mbs, so mine is severly outdated. With the amount of time it took me to do it, similar to the time you are using, I might be better off doing a subscription sort of thing for the categories that I need to keep an updated database with the most current dmoz dump. But that would require breaking the dump down once or twice a month, or something like that...

Can't never could do nothing till he whupped old couldn't till he could...
Quote Reply
Re: DMOZ Dumps..... In reply to
I downloaded the rdf file about a week ago for the Regional Category and the other categories were done using an rdf file that I downloaded about 2 weeks ago so it it pretty much 100% up to date.

Paul
Installations:http://wiredon.net/gt/
Support: http://wiredon.net/forum/

Quote Reply
Re: DMOZ Dumps..... In reply to
Understand,
Is there a way you can keep it updated, say once a month if we purchased a block??

Can't never could do nothing till he whupped old couldn't till he could...
Quote Reply
Re: DMOZ Dumps..... In reply to
Hmm......

So would you be wanting free updates after the first purchase?
I'm not sure if I'd want to do that because it would be a bit of an effort every month for nothing in return. (Yuck that sounds like greed but it wasn't meant to).

However, maybe an update fee of $10 after your first purchase would be good for me.

Do any others have thoughts on this?

What I will say is that I don't really want to take custom orders. Like people have been asking for certain sub-categories recently (which I don't mind occasionally) but I don't want to be doing it all the time. I would prefer people to just take the top-level categories and delete what they don't want themselves.

Paul
Installations:http://wiredon.net/gt/
Support: http://wiredon.net/forum/

Quote Reply
Re: DMOZ Dumps..... In reply to
Actually, I was thinking of like a subscription type thing. Possibly a plugin for Links to automatically update it??

Can't never could do nothing till he whupped old couldn't till he could...
Quote Reply
Re: DMOZ Dumps..... In reply to
Well there isn't really any point in making a plugin to be honest because it is just as quick to import the dump using MySQLMan. With a plugin, it would be large in size and would just perform the same function as if you just imported the dump from the gzipped file.

I could write a quick subscription/signup script that will take orders and keep track of how many updates people want and when they want them, etc....and will then bill you first time and send you an email everytime I update, giving you the option to get the updated version.....

Worth it?

Paul
Installations:http://wiredon.net/gt/
Support: http://wiredon.net/forum/

Quote Reply
Re: DMOZ Dumps..... In reply to
I'm thinking of a plugin that would automatically access your secure site using a registration number and uploading the subscribed to block of the DMOZ dump and automatically queing it for review prior to re-indexing/re-building... wow, that was a mouthfull!!

Can't never could do nothing till he whupped old couldn't till he could...
Quote Reply
Re: DMOZ Dumps..... In reply to
That shouldn't be too hard to do, although I'm not sure how I can setup the server to only allow access with a registration number.

(Although I have an idea) Wink



Paul
Installations:http://wiredon.net/gt/
Support: http://wiredon.net/forum/

Quote Reply
Re: DMOZ Dumps..... In reply to
This will be the home of my dumps now........

ftp://ftp.wiredon.net/outgoing/

You can visit here whenever you like to keep track of where I am up to. The directory names will appear but they will appear to be empty but that is because I have set the ownership/permissions to only allow the correct owners to view and download the files. So if you want a certain dump then you can contact me and after paying you will get a username and password which can then be used to access the dump....eg.....

ftp://username:password@ftp.wiredon.net/outgoing/category/

The dump will then be visible and you can download it or use wget or whatever.

Thanks :)

EDIT:
If you find any security holes with the setup, please be kind enough to let me know :).....I think I've got it covered though....

Paul
Installations:http://wiredon.net/gt/
Support: http://wiredon.net/forum/

Quote Reply
Re: DMOZ Dumps..... In reply to
What's your pricing structure?

Warwick

http://www.humorlinks.com
Quote Reply
Re: DMOZ Dumps..... In reply to
Hi,

Please contact me privately for prices. It depends on the dump you want and its size.

Thanks.

Paul
Installations:http://wiredon.net/gt/
Support: http://wiredon.net/forum/