Gossamer Forum
Home : Products : Gossamer Links : Discussions :

DMOZ import help

Quote Reply
DMOZ import help
I am working on the dmoz .rdf import. The file size of the .rdf is in the 700 MB range (uncompressed). The import has been running for 6 days now and no where near completion. Last check, Catlinks are at 667,426 and Categories are at 54,205.

My question is a matter of performance. Does this at all sound odd that the import has been running for 6 days and only 1/4 of the way complete? At this rate, were talking 2-3 weeks for total import (fingers are crossed that it doesn't unexpectedly terminate along the way).

CPU resources are averaging 90-95% and memory only 10-12%. This leads me to believe it's not a performance issue on my end as the memory isn't strapped and now using disk cache.

The machine: 400 MHZ PII, 196 MB Ram, newest Perl, newest MySQL and mod_perl.

Any ideas on how to increase performance and lower import time?

The import will soon be running for an entire week. Does this sound odd?


Quote Reply
Re: DMOZ import help In reply to
Catlinks arent the number of links you have imported......

Links is the table with the links in it. It should be reading 1,000,000 or so by now with that many categories imported.

Paul Wilson.
new - http://www.wiredon.net
Quote Reply
Re: DMOZ import help In reply to
Hi,

This sounds very strange. The import took about 8 hours on a dual p3 800 with 1 gig of ram, and I've heard 15-24 hours is the norm. Definately not in the weeks though.

You are usings Links SQL 2 with nph-import.cgi right? You aren't using the old Parse_RDF.pl script are you?

Cheers,

Alex

--
Gossamer Threads Inc.
Quote Reply
Re: DMOZ import help In reply to
Alex:

Look at the difference in horsepower:

>> 400 MHZ PII, 196 MB Ram, newest Perl, newest MySQL and mod_perl.

vs

>> dual p3 800 with 1 gig of ram

A single PII, running at half the speed of the dual P3's, with only 196 MEG of ram, not 1GIG.

Weeks... potentially!!

I remember the speed differences between my 486, Pentium, PII, and PIII chips, because I was always doing graphics processing. The jumps after the PII, were not as impressive as the first few jumps from 386->486->pentium->more MHz->PII.

But, I got anywhere from 10 to 30 fold increase in performance. 8 hours x 30 is 10 days.



PUGDOGŪ Enterprises, Inc.
FAQ:http://LinkSQL.com/FAQ
Forum:http://LinkSQL.com/forum
Quote Reply
Re: DMOZ import help In reply to
Alex,
The version is current 2.0 with the nph-import.cgi.
Tomorrow will be a full week and it's no where near completion.

Links = 732617 Categories = 56467 and slowly growing.

The database is on a second machine, about 5 feet away. This leaves me wondering if it has anything to do with the database being located on a different machine and having to use the network to access it.

I was wondering if I should install two copies of Links SQL. One on the database server that would only do the import. There is no web server installed on that machine so it would not be really used as a "second installation". This would eliminate the import being done across a network.

Not sure if the network would be the issue though.

What are your thoughts?

Quote Reply
Re: DMOZ import help In reply to
Hi,

I don't think the network would be an issue but give it a try. Let me know how it works out.

Cheers,

Alex

--
Gossamer Threads Inc.