Gossamer Forum
Home : Products : Gossamer Links : Discussions :

DMOZ Import CPU

Quote Reply
DMOZ Import CPU
What is a normal CPU level for a dmoz import?

It is going steady at about 95% at the moment..is that normal?

Paul Wilson.
http://www.wiredon.net/gt/
http://www.perlmad.com/
Quote Reply
Re: DMOZ Import CPU In reply to
Any process that grabs the foreground, and demsnds it, can use that much, especially if the other processes are willing to give it up.

The short answer is _yes_ that is quite reasonable for a dmoz import. It's a pig of a process.

If you are using the direct method, you are using a lot of resources, including gzip... it's intensive.


PUGDOGŪ Enterprises, Inc.
FAQ:http://LinkSQL.com/FAQ
Forum:http://LinkSQL.com/forum
Quote Reply
Re: DMOZ Import CPU In reply to
Ack......yes i'm using the gzip method but I used GClemmons advice and added the & at the end of the line and added the code into a file called import.sh and ran that so it runs in the background.


Paul Wilson.
http://www.wiredon.net/gt/
http://www.perlmad.com/
Quote Reply
Re: DMOZ Import CPU In reply to
I just set the NICE level (priority) to -20 (highest priority) - is that sensible?

It is supposed to make it run fater....

Paul Wilson.
http://www.wiredon.net/gt/
http://www.perlmad.com/
Quote Reply
Re: DMOZ Import CPU In reply to
Problem is that '&' sets it up as a background process, on that terminal, but it's still going to grab as much CPU as it can, and if you've changed it's priority, well, it will steal it from other processes, rather than the other processes stealing it from that one.

It's big, long process. It cannot run fast. It won't run fast. You just have to bear with it.



PUGDOGŪ Enterprises, Inc.
FAQ:http://LinkSQL.com/FAQ
Forum:http://LinkSQL.com/forum
Quote Reply
Re: DMOZ Import CPU In reply to
The only other processes running are gzip and the process monitor cgi script that I am using so hopefully increasing the priority won't harm anything.

Alex if you are listening, PLEASE think of a way to import without having to skip all non-needed categories first.Smile

Paul Wilson.
http://www.wiredon.net/gt/
http://www.perlmad.com/
Quote Reply
Re: DMOZ Import CPU In reply to
Someone had said there was a pre-parser for the dmoz file, that would cut it up before import. I haven't been able to locate that.


PUGDOGŪ Enterprises, Inc.
FAQ:http://LinkSQL.com/FAQ
Forum:http://LinkSQL.com/forum
Quote Reply
Re: DMOZ Import CPU In reply to
That sounds very useful - will have to have a look for that.

Thanks.



Paul Wilson.
http://www.wiredon.net/gt/
http://www.perlmad.com/
Quote Reply
Re: DMOZ Import CPU In reply to
Hi Paul,

Yes, 95% CPU is quite normal. CPU crunching is the main bottleneck and you would only see something lower if you had a very fast CPU and your disk drive became the bottleneck.

Setting priority to -20 is not recommended if you are using this system for anything else important, as then the import will get almost all cpu resources leaving very little for anything else that is running on the machine. However, if it's your own workstation and not a server, then that's fine.

Cheers,

Alex

--
Gossamer Threads Inc.
Quote Reply
Re: DMOZ Import CPU In reply to
Hi Alex,

It was Ed's RAQ and it took me 4 1/2 days to import 400,000 links from the Regional Category.

The whole category is over 600,000 so I decided to kill it after 400,000 as it was taking so long and the server was slowing down.



Paul Wilson.
http://www.wiredon.net/gt/
http://www.perlmad.com/