Gossamer Forum
Home : Products : Gossamer Links : Discussions :

Importing DMOZ data interrupted....now what?

Quote Reply
Importing DMOZ data interrupted....now what?
Greetings!

I issued the following command:

./nph-import.cgi --import RDF -source content.rdf.u8 --rdf-category="Top" --rdf-add-date="2002-01-01" --destination="/var/www/cgi-bin/path/cgi/admin/defs"

I began an import of DMOZ data and got to Recreation and had 1 million plus urls.

The process got killed due to reboot, or my client machine going offline, not sure which.

I now have issued:

./nph-import.cgi --import RDF --rdf-update -source content.rdf.u8 --rdf-category="Top" --rdf-add-date="2002-01-01" --destination="/var/www/cgi-bin/path/cgi/admin/defs"

I noticed that even with the --rdf-update it still started at the Adult category and worked its way down. And, this is not a new RDF file, it is the same one, so it would seem to me that it should read from Adult to Recreation, realize that it stopped at Recreation, and start from there again. It is not doing that.

Is there a problem or is it working as designed? Is there any harm in letting it continue and simply getting rid of duplicates via a "get rid of duplicates" script that must be a part of Links SQL?

Thanks.

Last edited by:

takacsj: Apr 20, 2002, 6:52 AM
Quote Reply
Re: [takacsj] Importing DMOZ data interrupted....now what? In reply to
If you are only doing a direct import (i.e with no links already int he database, other than those imported)...I would suggest re-doing the setup process, which should wipe all SQL stuff back to default. Then run the process again. I'm assuming you were using cron for this? If not...its definatly worth doing Wink My machine (PC, not server), crashes so often I would only get to about 10k links!

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy.] Importing DMOZ data interrupted....now what? In reply to
Thank you for your prompt reply. I will try the steps you suggest after waiting to see what the above execution does.

In any event, isn't there something "wrong" with --rdf-update, if it doesn't really work as documented? Or, is there something wrong in my above command line? The above is copied directly off my console.

What I see on my display is the script going through each category/subcategory, and then displaying a number to the right, like this:

TOPIC: Top/Arts/Aviation/Multimedia/Art ... 12

So, does this mean that it is adding those 12 urls, or that it is simply acknowledging their existence, and moving on without adding duplicate links?

By the way, doesn't a redo of the Setup drop/truncate existing tables, hence my 1 million urls would be lost?
Quote Reply
Re: [takacsj] Importing DMOZ data interrupted....now what? In reply to
>>>-source content.rdf.u8 <<<

This line looks a bit odd. I think it should be more like;

--source="/path/to/content.rdf.u8"

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy.] Importing DMOZ data interrupted....now what? In reply to
Hi Andy.

That file is in the same directory as the nph-import script, hence no path reference. There is a word for that in Perl programming, but that escapes me at the moment, something to do with not having to include a path when the file is located in the same directory as it is 'blank-blank' to the script. In any event, it is executing properly and importing the urls from the RDF file.

My quesiton was more of is this the way it is supposed to work when adding this: --rdf-update.

I'm probably going to start importing all over again, so it doesn't matter. For anyone out there, heed Andy's advice and execute the script so it continues regardless of your console connection.
Quote Reply
Re: [takacsj] Importing DMOZ data interrupted....now what? In reply to
I beleive adding --rdf-update should work...in the way that it only adds links that are new/have been modified.

As for that string thing, that wasn't really the point I was getting at. You are using -source content.rdf.u8. Note that you are missing another - at the beginning of -source (should be --source), and oyu are missing the = sign after the --source (i.e --source=content.rdf.u8).

Hope that makes a bit more sense Wink

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy.] Importing DMOZ data interrupted....now what? In reply to
Yes, yes. My bad. I see exactly what you mean now. Well if this next go around fails, then I'll make sure everything is correct.

By the way, what is that word I'm looking for?
Quote Reply
Re: [takacsj] Importing DMOZ data interrupted....now what? In reply to
I didn't even know there was a special word for it Tongue How about redundent? ;)

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!