I've seen several postings here concerning the import of the dmoz.org .rdf dump and I have a few questions.
I would like to import ALL of the dmoz data, 600-700 MB of .rdf data and need help with the command line.
I've got the content.rdf.gz file (Linux machine) but it doesn't appear to be gzipped at all. The file size is 139 MB and had a .gz extension. If I CAT out the file it displays text, not compressed data. The FILE command also tells me that it's ASCII text. This is odd to me. If I get the file to my home PC, the .gz is the same file size as my Linux machine but uncompressed to 702 MB. No big deal, it could be something on the Unix machine.
The question I have is:
How do I import ALL of the dmoz data? I will more than likely import twice a month to keep the links fresh. After I populate the database once (with all data), do I need to re-import all of the data or can I provide a flag to only import the data that is NEW or has CHANGED?
Please provide command line parameters for both. I'm new to this and would appreciate if it could be spelled out exactly as needed. Once for FULL import and once for INCREMENTAL import.
Thank you !!!
I would like to import ALL of the dmoz data, 600-700 MB of .rdf data and need help with the command line.
I've got the content.rdf.gz file (Linux machine) but it doesn't appear to be gzipped at all. The file size is 139 MB and had a .gz extension. If I CAT out the file it displays text, not compressed data. The FILE command also tells me that it's ASCII text. This is odd to me. If I get the file to my home PC, the .gz is the same file size as my Linux machine but uncompressed to 702 MB. No big deal, it could be something on the Unix machine.
The question I have is:
How do I import ALL of the dmoz data? I will more than likely import twice a month to keep the links fresh. After I populate the database once (with all data), do I need to re-import all of the data or can I provide a flag to only import the data that is NEW or has CHANGED?
Please provide command line parameters for both. I'm new to this and would appreciate if it could be spelled out exactly as needed. Once for FULL import and once for INCREMENTAL import.
Thank you !!!