Gossamer Forum
Home : Products : Links 2.0 : Discussions :

Open Directory RDF Dump Integration

(Page 1 of 3)
> >
Quote Reply
Open Directory RDF Dump Integration
Is there any way to use the RDF dump of the opd data (categories and sites) in Links 2.0 (not SQL). Any help would be greatly appreciated. The dumps are located here: http://dmoz.org/rdf.html.

Beau E. Gast
beaugast@hotpop.com
Quote Reply
Re: Open Directory RDF Dump Integration In reply to
I just tried the RDF import with linkSQL, the current RDF is 55 _MEG_ of .gz file, and it's over 270 _MEG_ uncompressed.
Quote Reply
Re: Open Directory RDF Dump Integration In reply to
I did an RDF Conversion program for our Hyperseek Users.... You can get it from: ftp://support.ccs.net/rdf-hs.cgi

This assumes that you have the 2 .rdf files in the same directory as the program. It will output subcategories.dat and hyperseek.dat (in Hyperseek format, but you can easily edit the .cgi and change filenames and field orders to fit links)

Please note: The .rdf files are HUGE (I think it's up to 100,000 categories and 700,000 listings), so you better have tons of disk space, plenty of time, and a little patience when converting.

Enjoy, John Cokos
Quote Reply
Re: Open Directory RDF Dump Integration In reply to
pugdog,

Where are the rdf dumps located on the server you're using. I'd rather not suffer thru that download and just add it to my directory thru your site. K? If not i'll understand!

Meanwhile... Thanks to jcokos for that WONDERFUL script! I tried it and it works great.

Thanx,

Beau

[This message has been edited by BeauGast (edited August 25, 1999).]
Quote Reply
Re: Open Directory RDF Dump Integration In reply to
Glad to help ...
Quote Reply
Re: Open Directory RDF Dump Integration In reply to
I don't have them on my server any more. I used a very small script via telnet:

Code:
#!/usr/local/bin/perl
use LWP::Simple;
$url = "http://dmoz.org/rdf/content.rdf.gz";
$rc=LWP::Simple::getstore($url, "content.txt");

to suck the file from dmoz to my server without having to download/upload it.

This is much faster going from their server to your server. then just "gunzip" and run your import.

You'll need 300+ meg of disk space to do that.
Quote Reply
Re: Open Directory RDF Dump Integration In reply to
One of the things I've found with the DMOZ data is that not only is it very big, but it's not real "good" (for lack of a better word).

Aside from the large number of categories, many of the links have very cheesy descriptions, like "This is my girlfriends cool home page" ... so use the data with a grain of salt.
Quote Reply
Re: Open Directory RDF Dump Integration In reply to
Now I get the error:

[Thu Aug 26 15:47:36 1999] [error] [client 209.214.146.94] malformed header from script. Bad header=..............................: /data1/hypermart.net/yeehaw/private/admin/data/rdfconvert.cgi

Can you tell what's wrong??
Quote Reply
Re: Open Directory RDF Dump Integration In reply to
The file jcokos posted was ment to run from the command line and not from the web. The script will timeout anyway. So, inanother words, install perl on your own computer or find a host that provides telnet access.
Quote Reply
Re: Open Directory RDF Dump Integration In reply to
If you don't have access to a command line, or perl on your own computer, then you can easily modify the script to run over a browser, as follows:

Right at the top, add the following 2 lines (under the #!/usr/local/bin/perl):

print "HTTP/1.0 200 OK\n";
print "Content-Type: text/plain\n\n";

Then, rename the program to "nph-import.cgi" (nph-anything) so that it'll be able to run for an extended period over the browser.

It may still timeout, but you'll get past the initial errors.

Quote Reply
Re: Open Directory RDF Dump Integration In reply to
Ok. Got it. Thanks for all the help!

Beau Gast
beaugast@hotpop.com
Quote Reply
Re: Open Directory RDF Dump Integration In reply to
Hi!
Is there any way to get only some categories into my links db???

Background: I drive a site about live-roleplaying, and would like to copy just the few categories relatet to this into my own database!

bye
Tiggr

------------------
visit [ulr]http://larp-welt.de/[/url]
the resource for german live roleplaying


Quote Reply
Re: Open Directory RDF Dump Integration In reply to
At about line 80, in the section where it's going through the big rdf, are these lines:

$Category =~ s/_/ /g;

After that, put something like this:

next if $Category !~ "Something";

Where "Something" would be a string to match in the category name (like "Role Playing" or "RPG" or something similar). That change would force it to skip any listing in a category that doesn't meet your criteria.

John
Quote Reply
Re: Open Directory RDF Dump Integration In reply to
Hi all....

I moved the rdf conversion utility to a new url:
http://www.iwebsupport.com/files/rdf-convert.zip

Apologies to those of you that were going to the old location and finding nothing.

I'll post here (as well as to our website) the URL of the selective import utility, that'll allow you to selectively download any category (or categories) from our Converted Open Directory Database. As soon as I complete the code, I'll let you all know where to get it.

John

[This message has been edited by jcokos (edited September 30, 1999).]
Quote Reply
Re: Open Directory RDF Dump Integration In reply to
How about posting a Links customized version of the rdf-convert script. I'm having problems with Links telling me theres no date for the files and I'm sure DMOZ includes that in their files. Sorry folks I'm NOT a programmer by any means!

Thanks so much!!!
Quote Reply
Re: Open Directory RDF Dump Integration In reply to
Actually, DMOZ only has Title, Description, URL, and Category.... no unique ID, no date, no email, nothing like that at all...

If you go to http://www.hyperseek.com/dmoz.shtml I have the first version of the DMOZ Exporter, with a GT Links option almost 100% completed. I'm not "officially" announcing it, as we still need to test it, get feedback, etc.... but you're more than welcome to stop by and give it a run.

John
Quote Reply
Re: Open Directory RDF Dump Integration In reply to
Would really like to use that script...but

Quote:
At about line 80, in the section where it's going through the big rdf, are these lines:
$Category =~ s/_/ /g;

After that, put something like this:

next if $Category !~ "Something";

Where, exactly? "around line 80" means nothing to me - I'm not a programmer.


[This message has been edited by lordmouse (edited November 05, 1999).]
Quote Reply
Re: Open Directory RDF Dump Integration In reply to
I noticed that myself. I used your nice import at your site at first. It came up with 6000 or so links... then I went to dmoz and found out there where twice as many in the category that I was doing.
Quote Reply
Re: Open Directory RDF Dump Integration In reply to
I just used the Hyperseek utility to pick up select categories but noticed that the categoires under Computers:Internet:Commerceial Servcies:E-Commerce category are markedly different from those at dmoz site.

Does this utltity read the entire d'base, or a scaled down version of it?

How frequently is it refresh with the latest RDF dumps?

Thanx for advise and direction.

Raza
Quote Reply
Re: Open Directory RDF Dump Integration In reply to
I'm a little bit behind on my DMOZ import....just been to busy to redownload and load up the DB with the latest .rdf files.

I'll try and clear up some time this week.
Quote Reply
Re: Open Directory RDF Dump Integration In reply to
Apologies on the oldness, once again. I'll announce here once I get it reloaded.
Quote Reply
Re: Open Directory RDF Dump Integration In reply to
That would be nice.

By the way, I am not quite clear on how to import the extracted data into LINKS. Can someone please explain. Also, do I need to create the directory / category structure in the Admin module's category section as well, or does that get automatically created when importing the data?
Quote Reply
Re: Open Directory RDF Dump Integration In reply to
I noticed that too when I first saw that RDF import util. I don't remember exactly but it either misses links with descriptions or links without descriptions. Wink

In any case, if someone with some perl ability wants to take my RDF_Import.pl that imports the links into an SQL database, send me an email and I'll email you the file. (It shouldn't be too hard, just change it so that instead of running an SQL statement, you write to a flat file).

Cheers,

Alex
Quote Reply
Re: Open Directory RDF Dump Integration In reply to
Not sure I follow here, ...

>> either misses links with descriptions or links without descriptions.

In the perl program, or the online query thingie?

Specifically, what's it not doing? I've used that .cgi about 40 times for customers of mine (believe it or not, they all periodically load the entire DMOZ into their database), so I know that it's working from a HS standpoint.

John
Quote Reply
Re: Open Directory RDF Dump Integration In reply to
maybe I do, sorta. About a hundred or so links that I imported had descriptions that were cut off mid-word or sentence.

maybe this is the wrong forum, but since it's a OPD thread, why not ask... How do I get the export utility to also export the related categories part of structure.rdf?
> >