Gossamer Forum
Home : Products : Gossamer Links : Version 1.x :

Parse_RDF.pl - second batch

Quote Reply
Parse_RDF.pl - second batch
Hi,

I've parsed my first batch of DMOZ links with Parse_RDF.pl, and it worked great. I'd like to add other DMOZ links/categories to my existing links/categories.. however, for some reason the parser completely skips over the entries I've specified and adds nothing to the database.

Any idea what the problem is? I've posted the config portion of the script below..

Thanks! :)
Katina

--

# 1. Set this to the location of the content.rdf file (note you can leave the file gzipped).
my $CONTENT = './content.rdf.gz';

# 2. You can leave the file gzipped if you are short on disk space, just tell
# the program where your gzip program is. The -c decompresses to stdout and is
# required. The -d says to decompress (you can use gunzip as well).
my $GZIP = '/usr/bin/gzip -cd';

# 3. Set what subset of the Open Directory you want to parse.
my $SUBSET = 'Top/Computers/Internet/Commercial Services/Access Providers/By Region/North America/United States/Christian';

# 4. You can insert the categories into an existing subcategory, or if you leave this
# blank, links will be added to the existing category.
my $PREFIX = 'Computers & Internet/Internet Service Providers/';

# 5. Append? If set to 1 the script will add all links and categories, if set to 0, the
# script will only add links/categories that don't already exist (slows down the parsing).
my $APPEND = 1;

# 6. Defaults to use for Add_Date and Contact Name/Contact Email. Note: Don't set add
# date to today, otherwise you will end up with WAY TO MANY new links.
my $ADD_DATE = '1999-12-01';
my $CONTACT_N = 'DMOZ';
my $CONTACT_E = '';

# 7. Max lines per category. Shouldn't need to touch this.
my $max_limit = 5000;

Quote Reply
Re: Parse_RDF.pl - second batch In reply to
Hey guys,

I posted the above message a couple days ago, plus sent a request to "support@gossamer-threads.com".. I still have yet to receive an answer from either. Gossamer-threads usually provides excellent customer support, so I don't understand what the hold-up is this time. I really need some help with this to complete my project, so your response is greatly appreciated.

Thank you, :)
Katina

Quote Reply
Re: Parse_RDF.pl - second batch In reply to
I've never been able to figure out why the parser just skips things.

I've gotten around it by editing the Content.rdf file in an editor,and cutting out the sections I want, and then running the script.

It seems I could run the script 10 times, 9 times nothing happened, then 10th, it worked.

I really don't know.

http://www.postcards.com
FAQ: http://www.postcards.com/FAQ/LinkSQL/

Quote Reply
Re: Parse_RDF.pl - second batch In reply to
Hi Pugdog,

How were you able to open/edit it? Isn't the file close to 1 gig? I think it would probably crash my system.. But at this point, I'm willing to try almost anything.. hehe..

Thanks! :))

Katina

Quote Reply
Re: Parse_RDF.pl - second batch In reply to
Well, the last time I tried to use it it was only 600 meg, but the "trick" I used was the 'split' command in Unix, and cut the file into 64 meg chunks. I then used "joe" to edit these smaller files, and cut out the sections I needed. Fortunately, the parts I needed did not cross file split boundaries. If they did, though, all I would have to do is come up with another split value that was NOT an even multiple of the 64 (such as 45, or 80 or even 100. (so the boundaries would move).

Some windows editors now have unlimited file sizes, and I've been using EditPlus after someone reccomended it, and have been very happy with it! (I need to register it, but it doesn't expire, which is all the more reason I'm using it and will eventually register.)



http://www.postcards.com
FAQ: http://www.postcards.com/FAQ/LinkSQL/

Quote Reply
Re: Parse_RDF.pl - second batch In reply to
Hi Pugdog,

Sounds like a great solution.. How do I use the "split" command in Unix? And do I have to unzip the file first?

Katina

Quote Reply
Re: Parse_RDF.pl - second batch In reply to
Yeah, you need to unzip the file... You might be better off downloading it, and doing it on Windows. (Unless you have that much discspace). Usually people have more local than web disk space.

You can file file split commands in most of the utility archives.

As for the split command on Unix, it varies a bit from system to system, you should be able to get some info on your flavor with the man command.

http://www.postcards.com
FAQ: http://www.postcards.com/FAQ/LinkSQL/

Quote Reply
Re: Parse_RDF.pl - second batch In reply to
Thanks Pugdog :)) I'll try it out!

Katina