Gossamer Forum: Products: Gossamer Links: Discussions: Importing DMOZ data interrupted....now what?

Apr 20, 2002, 6:50 AM

takacsj

Novice (47 posts)

Apr 20, 2002, 6:50 AM

Post #1 of 8

Shortcut

Importing DMOZ data interrupted....now what?

Greetings!

I issued the following command:

./nph-import.cgi --import RDF -source content.rdf.u8 --rdf-category="Top" --rdf-add-date="2002-01-01" --destination="/var/www/cgi-bin/path/cgi/admin/defs"

I began an import of DMOZ data and got to Recreation and had 1 million plus urls.

The process got killed due to reboot, or my client machine going offline, not sure which.

I now have issued:

./nph-import.cgi --import RDF --rdf-update -source content.rdf.u8 --rdf-category="Top" --rdf-add-date="2002-01-01" --destination="/var/www/cgi-bin/path/cgi/admin/defs"

I noticed that even with the --rdf-update it still started at the Adult category and worked its way down. And, this is not a new RDF file, it is the same one, so it would seem to me that it should read from Adult to Recreation, realize that it stopped at Recreation, and start from there again. It is not doing that.

Is there a problem or is it working as designed? Is there any harm in letting it continue and simply getting rid of duplicates via a "get rid of duplicates" script that must be a part of Links SQL?

Thanks.

Last edited by:

takacsj: Apr 20, 2002, 6:52 AM

Apr 20, 2002, 7:35 AM

Andy

Veteran / Moderator (18441 posts)

Apr 20, 2002, 7:35 AM

Post #2 of 8

Shortcut

Re: [takacsj] Importing DMOZ data interrupted....now what? In reply to

If you are only doing a direct import (i.e with no links already int he database, other than those imported)...I would suggest re-doing the setup process, which should wipe all SQL stuff back to default. Then run the process again. I'm assuming you were using cron for this? If not...its definatly worth doing Wink

My machine (PC, not server), crashes so often I would only get to about 10k links!

Andy (mod)
andy@ultranerds.co.uk

Apr 20, 2002, 5:30 PM

takacsj

Novice (47 posts)

Apr 20, 2002, 5:30 PM

Post #3 of 8

Shortcut

Re: [Andy.] Importing DMOZ data interrupted....now what? In reply to

Thank you for your prompt reply. I will try the steps you suggest after waiting to see what the above execution does.

In any event, isn't there something "wrong" with --rdf-update, if it doesn't really work as documented? Or, is there something wrong in my above command line? The above is copied directly off my console.

What I see on my display is the script going through each category/subcategory, and then displaying a number to the right, like this:

TOPIC: Top/Arts/Aviation/Multimedia/Art ... 12

So, does this mean that it is adding those 12 urls, or that it is simply acknowledging their existence, and moving on without adding duplicate links?

By the way, doesn't a redo of the Setup drop/truncate existing tables, hence my 1 million urls would be lost?

Apr 21, 2002, 2:35 AM

Andy

Veteran / Moderator (18441 posts)

Apr 21, 2002, 2:35 AM

Post #4 of 8

Shortcut

Re: [takacsj] Importing DMOZ data interrupted....now what? In reply to

>>>-source content.rdf.u8 <<<

This line looks a bit odd. I think it should be more like;

--source="/path/to/content.rdf.u8"

Andy (mod)
andy@ultranerds.co.uk

Apr 21, 2002, 6:15 AM

takacsj

Novice (47 posts)

Apr 21, 2002, 6:15 AM

Post #5 of 8

Shortcut

Re: [Andy.] Importing DMOZ data interrupted....now what? In reply to

Hi Andy.

That file is in the same directory as the nph-import script, hence no path reference. There is a word for that in Perl programming, but that escapes me at the moment, something to do with not having to include a path when the file is located in the same directory as it is 'blank-blank' to the script. In any event, it is executing properly and importing the urls from the RDF file.

My quesiton was more of is this the way it is supposed to work when adding this: --rdf-update.

I'm probably going to start importing all over again, so it doesn't matter. For anyone out there, heed Andy's advice and execute the script so it continues regardless of your console connection.

Apr 21, 2002, 6:23 AM

Andy

Veteran / Moderator (18441 posts)

Apr 21, 2002, 6:23 AM

Post #6 of 8

Shortcut

Re: [takacsj] Importing DMOZ data interrupted....now what? In reply to

I beleive adding --rdf-update should work...in the way that it only adds links that are new/have been modified.

As for that string thing, that wasn't really the point I was getting at. You are using -source content.rdf.u8. Note that you are missing another - at the beginning of -source (should be --source), and oyu are missing the = sign after the --source (i.e --source=content.rdf.u8).

Hope that makes a bit more sense Wink

Andy (mod)
andy@ultranerds.co.uk

Apr 21, 2002, 6:26 AM

takacsj

Novice (47 posts)

Apr 21, 2002, 6:26 AM

Post #7 of 8

Shortcut

Re: [Andy.] Importing DMOZ data interrupted....now what? In reply to

Yes, yes. My bad. I see exactly what you mean now. Well if this next go around fails, then I'll make sure everything is correct.

By the way, what is that word I'm looking for?

Apr 21, 2002, 6:33 AM

Andy

Veteran / Moderator (18441 posts)

Apr 21, 2002, 6:33 AM

Post #8 of 8

Shortcut

Re: [takacsj] Importing DMOZ data interrupted....now what? In reply to

I didn't even know there was a special word for it Tongue

How about redundent? ;)

Andy (mod)
andy@ultranerds.co.uk