Gossamer Forum
Home : Products : Gossamer Links : Version 1.x :

DMOZ Import Problem

Quote Reply
DMOZ Import Problem
Good morning,

I am having a problem importing links into the correct categories using the latest version of Parse_RDF.pl. Although I specify a subcategory in my directory, the imported links end up scattered throughout different "Arts & Humanities" sections.

I have sectioned the DMOZ file and renamed it to "refined_topics.rdf.tar.gz". My settings in the Parse_RDF.pl file look like this:

-----
# 1. Set this to the location of the content.rdf file (note you can leave the file gzipped).
my $CONTENT = './refined_topics.rdf.tar.gz';

# 2. You can leave the file gzipped if you are short on disk space, just tell
# the program where your gzip program is. The -c decompresses to stdout and is
# required. The -d says to decompress (you can use gunzip as well).
my $GZIP = '/usr/bin/gzip -cd';

# 3. Set what subset of the Open Directory you want to parse.
my $SUBSET = 'Top/Arts/Music';

# 4. You can insert the categories into an existing subcategory, or if you leave this
# blank, links will be added to the existing category.
my $PREFIX = 'Top/Arts & Humanities/Music/';

# 5. Append? If set to 1 the script will add all links and categories, if set to 0, the
# script will only add links/categories that don't already exist (slows down the parsing).
my $APPEND = 1;
-----

I have tried the "append" on both 1 and 0 with the same results.

Although I have erased the links for now, you can view my site at http://cRealm.com if that helps to explain the problem.

I sincerely appreciate any help you can offer!!

Thanks in advance,

Katina


Quote Reply
Re: DMOZ Import Problem In reply to
Is the links being imported, but just in the wrong categories?? I had the same problem. I input all the categories in the same order as they show up on the .rdf, and then give each catid on the .rdf the same number that Links gave the category, ran Parse_RDF, and they imported into the correct categories. This will only work with a small number of links/categories, because you have to manually edit the files. I imported 147 categories and 3000 links with no problem by doing it this way. Hope this helps...

Trust in your elders, for they hold the key to life...
Quote Reply
Re: DMOZ Import Problem In reply to
Hi Kilroy,

Hmm.. unfortunately that probably won't work for me :( I have about 3,000 categories.. I am hoping to get Parse_RDF.pl to create/insert the DMOZ subcategories into the main categories on my site..

In answer to your question: Yes, the links are imported, but they do not appear in the category I specified.. They are scattered throughout my directory in a random fashion.. I can't figure it out..

Katina

Quote Reply
Re: DMOZ Import Problem In reply to
Another possible way is to break down the rdf a little further. And then recreate the directory scheme. Other than that, this is probably a pugdog/alex sort of question!!!

BTW: For people that are trying to download the content.rdf.gz and having your browser get to 99% and never finish downloading, download and use Mosaic... It works perfect. I downloaded the entire content.rdf.gz (finally got a clean copy), and broke it up using the Vedit program from a previous post. I believe that if you have a windows environment, and a lack of knowledge of perl in a windows environment, this is the route to go!!!

Trust in your elders, for they hold the key to life...
Quote Reply
Re: DMOZ Import Problem In reply to
Im also having some trouble with importing my database. When running Parse_RDF.pl I get this error after about 5 minutes

-----------------------

Can't find closing </externalpage> tag! at /usr/local/etc/httpd/cgi-bin/admin/se
tup/Parse_RDF.pl line 312, <CONTENT_FILE> chunk 5011.
Database handle destroyed without explicit disconnect, <CONTENT_FILE> chunk 5011
.

----------------------

Basically what I did is copied what section I wanted from the content.rdf from Dmoz. Im not sure if I have done it right but the file is 3mb.

This is how my rdf basically looks at the moment.

---------

<Topic r:id="Top/Regional/Oceania/Australia">ajeccatid>261</catid>ajeclink r:resources in authwww.auspost.com.au/postcodes/"/>ajeclink1 r:resources in authwww.csu.edu.au/australia/"/>ajeclink r:resources in authwww.whitepages.com.au/"/>ajeclink r:resources in authwww.whereis.com.au/"/>ajeclink r:resources in authwww.oanda.com/converter/classic?user=wilmap"/>ajeclink r:resources in authwww.channel8.net/australia/oz_facts.htm"/>ajeclink r:resources in authapollo13.virtualave.net/search/au/index.shtml"/>ajeclink r:resources in authwww.macnet.mq.edu.au/"/>ajeclink r:resources in authwww.aaa.com.au/online/local/"/>ajeclink r:resources in authwww.pacific.com.au/"/>ajeclink r:resources in authwww.asap.unimelb.edu.au/asa/directory/"/>ajeclink r:resources in authwww.goeureka.com.au/standard.php"/>ajeclink r:resources in authwww.aha-webdesign.de/englisch/html/australia.html"/>a</Topic>

<ExternalPage abouts in authwww.auspost.com.au/postcodes/">ajecd:Title>Australia Post - Post Code Search</d:Title>a</ExternalPage>

Do I need to change that format?

Thanks
Jason