Gossamer Forum
Home : Products : Gossamer Links : Discussions :

dmoz import dosen't import anything!

Quote Reply
dmoz import dosen't import anything!
Hi, trying to use the dmoz.com import. It is on a completly blank database, well i just added the 1 user dmoz.

I type in this:

perl nph-import.cgi --import RDF --source=/home/ascifi/www/structure.rdf.u8.gz.txt --rdf-category="Top/Arts" --rdf-add-date="2001-01-10" --destination=/home/ascifi/www/cgi-bin/dir/admin/defs --create-missing-categories --rdf-user="dmoz"

it run the program, takes about 10 mins or so to do it but then once completed there are no links categories or anything. Nothing has been imported. What am i doing wrong?

thanks

http://www.ASciFi.com/ - The Science Fiction Portal
Quote Reply
Re: dmoz import dosen't import anything! In reply to
i tried everything without the " as well as i was not 100% sure they are meant to be there or not, anyway same result.

i bet it is something obvious :)

http://www.ASciFi.com/ - The Science Fiction Portal
Quote Reply
Re: dmoz import dosen't import anything! In reply to
I see your problem... You need to use content.rdf.whatever instead of structure.rdf.whatever

Jason Rhinelander
Gossamer Threads
jason@gossamer-threads.com
Quote Reply
Re: dmoz import dosen't import anything! In reply to
ok... i bow my head in shame :)

http://www.ASciFi.com/ - The Science Fiction Portal
Quote Reply
Re: dmoz import dosen't import anything! In reply to
ok how long should it take, about? mine has been going, according to top for 452 minutes so far...

I am on a PIII650 with 256MB ram, it is using almost 95% of the processor (not much other activity).

http://www.ASciFi.com/ - The Science Fiction Portal
Quote Reply
Re: dmoz import dosen't import anything! In reply to
my worry is that myphpadmin seems to suggest that no links have been added at all yet.

This is the call i used, if there is something wrong with it please tell me!

perl nph-import.cgi --import RDF --source=/home/ascifi/www/content.rdf.u8.gz.txt -rdf-category=Top/Arts --rdf-add-date=2001-01-10 --destination=/home/ascifi/www/cgi-bin/dir/admin/defs --create-missing-categories --rdf-user=dmoz

http://www.ASciFi.com/ - The Science Fiction Portal
Quote Reply
Re: dmoz import dosen't import anything! In reply to
it has been going for 10 hours now. I guess it is not working??? What is the best way to end the process and try again?

http://www.ASciFi.com/ - The Science Fiction Portal
Quote Reply
Re: dmoz import dosen't import anything! In reply to
well it finished, must have taken 12 hours or something, but no links or categories or anything appeared in the database.

looking at what i type i did -rdf-category=Top/Arts instead of --rdf-category=Top/Arts now i don't know if this matters, it did not give an error message like it does if you do not specifiy a category so my guess is not. Anyway, i am going to start it again just in case (my poor server!) but if there is something i am doing wrong please oh please tell me :)

http://www.ASciFi.com/ - The Science Fiction Portal
Quote Reply
Re: dmoz import dosen't import anything! In reply to
What is the filesize of your content RDF file? That .gz makes me think that it might be gzipped...

If you are getting no output to your shell except for "Importing from /your/rdf/file", then there is definately a problem (and quite possibly it is that the file is gzipped).

If your file IS gzipped (the filesize will be around 130MB if I recall correctly) then you will need to gunzip it, or (better option) use the --with-gzip /usr/bin/gzip option which will make the import open a uncompressed stream directly from the gzipped file.

If it works successfully, you should see a number of categories displayed on the screen. The first happens to be "Top/Adult" due to alphabetical ordering; all these category names should be displayed with "skipped" at the end of the line. Once it gets down to Top/Arts it should import them, then stop once it gets past Top/Arts (don't remember what category this is).

Jason Rhinelander
Gossamer Threads
jason@gossamer-threads.com
Quote Reply
Re: dmoz import dosen't import anything! In reply to
136018526 i assume this is bytes.. it is my ftp display.

i used lwp to upload it, i think that will have uncompressed it automatically.

do you want login access to my server to have a look?

http://www.ASciFi.com/ - The Science Fiction Portal
Quote Reply
Re: dmoz import dosen't import anything! In reply to
ok i will try the --with gzip bit.

what is the best way to end the perl process that is running then?

http://www.ASciFi.com/ - The Science Fiction Portal
Quote Reply
Re: dmoz import dosen't import anything! In reply to
Well, that's compressed then. The uncompressed file is somewhere between 600 and 700MB. Try using the --with-gzip="/usr/bin/gzip" option - it should solve your problems!

Jason Rhinelander
Gossamer Threads
jason@gossamer-threads.com
Quote Reply
Re: dmoz import dosen't import anything! In reply to
Perhaps Control-C or else you could get the process ID with ps -ax and then kill it with kill.

Jason Rhinelander
Gossamer Threads
jason@gossamer-threads.com
Quote Reply
Re: dmoz import dosen't import anything! In reply to
ok it stopped and is now working.. well running through categories.

http://www.ASciFi.com/ - The Science Fiction Portal
Quote Reply
Re: dmoz import dosen't import anything! In reply to
wwweeeeehhhayy it worked.. thank you so much...

i thought it was not gziped cos alex suggested it might be ungziped when it was downloaded and with the extension .txt put on it baffled me.

thank you so much!

http://www.ASciFi.com/ - The Science Fiction Portal
Quote Reply
Re: dmoz import dosen't import anything! In reply to
Great! :)

Because of this problem you've found, I've made a change to the RDF import so that it now checks to see whether the file is a binary file; if it is, it assumes it is gzipped. Due to some browsers uncompressing on download but leaving it as .gz, and some adding .txt but leaving it as gzip, and whatever else browsers might be doing this seems like a far more reliable way to determine whether the file is actually gzipped.

Jason Rhinelander
Gossamer Threads
jason@gossamer-threads.com
Quote Reply
Re: dmoz import dosen't import anything! In reply to
thanks.. would have saved me a day of fiddling with it :) so i am sure it will help others.

http://www.ASciFi.com/ - The Science Fiction Portal
Quote Reply
Re: dmoz import dosen't import anything! In reply to
Alex in the Beta5 version, as I make to import the data base RDF for M$SQL 7? Here it did not function... nph-import.cgi only works with mySQL?

-=-=-=-=-=-=-=-=-=-
For a better world.Cool
Janio Le
-=-=-=-=-=-=-=-=-=-
Quote Reply
Re: dmoz import dosen't import anything! In reply to
Hi,

Thanks, because the import does not use GT::SQL (for performance considerations), it's still a little touchy.

I've got the RDF import working with MS SQL now, and it will be available in the final.

Cheers,

Alex

--
Gossamer Threads Inc.