Gossamer Forum
Home : Products : Gossamer Links : Discussions :

HELP: DMOZ import, missing + incomplete data

Quote Reply
HELP: DMOZ import, missing + incomplete data
I've "attempted" to import the .rdf from dmoz. The file was content.rdf.gz, 139 MB compressed and 700+ uncompressed.

My import took 26 hours (400MHZ PII / 256 MB Ram) and only brought in 456,000 links. Dmoz shows 2.3 million so I am way off. As far as I can tell, the import didn't die. CPU was nearly maxed at 85-90% and mem usage was around 10% during import.

When you go to the dynamic site, I only have the following categories.
- Adult (0)
- Arts (0)
- Business (0)
- Computers (0)

Beside each category the number of links listed is zero (0). If you navigate a few catagories inside some have numbers (137) and others are empty (0) although they DO HAVE DATA IN THAT CATEGORY. So non-empty categories show (0) even though there are plenty of links there.

Am I missing something here, do I need to do anything after such a large import? If so, will someone be so kind as to detail the steps AFTER import to make everything golden?

One other thing, what .rdf files are to be used? content.rdf.gz OR content.rdf.u8.gz ? There are differences and this may be the issue.

Here is my import command, hope it's right:
./nph-import.cgi --import RDF --destination=/usr/local/apache/htdocs/scripts/links/admin/defs --source="content.rdf" --rdf-category="Top" --rdf-add-date="2001-02-05"

Can someone help me out here, I'm having issues and with a 26 hour import it's hard to keep doing it, over and over again to get it right.

Lastly, do we need to import the CONTENT and STRUCTURE dumps?




Quote Reply
Re: HELP: DMOZ import, missing + incomplete data In reply to
The only file I have ever used is content.rdf.u8.gz and DMOZ recommend this too.

Did you follow instructions and re-index your database and then rebuild?



Paul Wilson.
new - http://www.wiredon.net
Quote Reply
Re: HELP: DMOZ import, missing + incomplete data In reply to
do a repair tables. I had this problem when importing and it normally sorted it out. If it is mysqlman that is tell you you only have 400,000 links of whatever then i don't know. Try repairing tables, it helped me.

http://www.ASciFi.com/ - The Science Fiction Portal
Quote Reply
Re: HELP: DMOZ import, missing + incomplete data In reply to
Its possible that the import stopped or timed out and so not as many links were added as was thought.

Paul Wilson.
new - http://www.wiredon.net
Quote Reply
Re: HELP: DMOZ import, missing + incomplete data In reply to
You must do a repair table after doing any import to update the counters and new/cool flags.

You should use content.rdf or content.u8.rdf. Structure file is not used.

Cheers,

Alex

--
Gossamer Threads Inc.
Quote Reply
Re: HELP: DMOZ import, missing + incomplete data In reply to
Yeah, it could have died, not sure as the import was an unatended process.

When running the repair tables from the admin, the following error shows in my error log. The browser just diaplays a popup error box about unrecognized data.

panic: POPSTACK
Callback called exit at /usr/local/apache/htdocs/scripts/links/admin/nph-build.c
gi line 663.

BEGIN failed--compilation aborted at /usr/local/apache/htdocs/scripts/links/admi
n/nph-build.cgi line 663.

Dmoz recommends the U8 files, should I try those and how can I start the import from where it left off, so it will ignore all previous data (IE, not import 26 hours of data that's already there).