Gossamer Forum
Home : Products : Gossamer Links : Version 1.x :

Parse_RDF.pl file error

Quote Reply
Parse_RDF.pl file error
When i try and run the Parse_RDF.pl file, after about 2 or three minutes the following error appears.

DBI::db=HASH(0x80c21ac)->disconnect invalidates 1 active statement. Either destr
oy statement handles or call finish on them before disconnecting. at Parse_RDF.p
l line 358, <CONTENT_FILE> chunk 398720.

What am i doing wrong?
Quote Reply
Re: Parse_RDF.pl file error In reply to
Are you on a virtual host?



------------------
POSTCARDS.COM -- Everything Postcards on the Internet www.postcards.com
LinkSQL FAQ: www.postcards.com/FAQ/LinkSQL/









Quote Reply
Re: Parse_RDF.pl file error In reply to
Yes i am
Quote Reply
Re: Parse_RDF.pl file error In reply to
The first thing that occured to me is that your ISP has a daemon running, and it sees the massive CPU usage and terminates the process.

That error message has the ring of a terminated/killed process to it, rather than anything "wrong".


Quote Reply
Re: Parse_RDF.pl file error In reply to
sportsguy, are you parsing data from ODP (dmoz.org)?

What program are you using?
Quote Reply
Re: Parse_RDF.pl file error In reply to
I am using Parse_RDF.pl and yes i am using the ODp from dmoz.

I still get the error at the end but i got it to insert links into the database by cutting out the categories i want manually from the rdf file. (Like baseball and football, etc.)

The problem i am running into now is that when links are inserted they seem to be entering randomly in the database.

for instance, i wanted to insert a bunch of Equestrian links into the Equestrian categoriy, but some ended up going into NFL/NewEnglandPatriots and NFL/WashingtonRedskins, etc.

Here is how i am setting my variables at the top of the file.

# Grab subset of Open directory.
my $SUBSET = 'Top/Sports';


my $PREFIX = 'Other_Sports/Equestrian/';

I don't get how this Parse thing works and why it works the way it does.

Frustrating.
Quote Reply
Re: Parse_RDF.pl file error In reply to
Links SQL comes with a Parse_RDF.pl file that will take the content.rdf.gz file and run through it and put whatever sub-tree you want into your Links SQL database.

You FTP the file from DMOZ.org, and then run the Parse_RDF.pl on it. The file is over 100Meg, at this point. I decompressed it at 556 Meg. I ran 'split' on it, and broke it into 50 meg chunks, then used grep to find what files had the parts I needed, and used Joe to grab those parts and put them into a separate file.

Since I only wanted certain sub sets, spending an hour or two editing the RDF file, let me import the sections I needed in a few minutes each. Saved literally _days_ of time from trying to parse in several different sub-trees.

Quote Reply
Re: Parse_RDF.pl file error In reply to
That is exactly what I did. i have about thirty different .rdf files (Equestrian.rdf, Baseball.rdf, Football.rdf, etc.) but when i tested the script, it inserted the links randomly, it seemed, into the database.

Now I have a about about 1000 Equestrian and motor sport links randomly inserted into my database.

Sorry. It sounds like I'm whining. i'm just frustrated. argh.

Quote Reply
Re: Parse_RDF.pl file error In reply to
The import isn't perfect, and for some reason it works for some people, but not others.

If you have the files small enough, keep doing the imports into a test database, and you'll eventually figure out what works on your system.

I had to try about 10 or 12 times last night to get it to work, and I really still don't know what I'm doing differently now, than I did before Smile

Quote Reply
Re: Parse_RDF.pl file error In reply to
Hi, Big Question.
Is it possible to re-import DMOZ data into an existing database without destroying the records currently existing in the database?
Here's the situation. Currently I have DMOZ data in my database. Perhaps in a month's time,there will be some additions of categories and links to this database, that is not added to the DMOZ because the DMOZ editors have stricter ways of categorizing than our company. So in 2 months time, our database contents will not be the same anaymore. If I were to import the DMOZ again, how can I do it so that the data we have in our database will not be lost?

Thanks very much.
Quote Reply
Re: Parse_RDF.pl file error In reply to
Importing does just that... it imports.

You won't destroy anything, but you might end up with a mess of categories, since you've changed and modified some of the ones they have.

I think the import can be set to not import any _links_ that are already in your data base, ie: URL's. I haven't looked at the import code closely, but that would be a select on URL.

You can also import DMOZ into a subdirectory, as long as it doesn't get confused with your existing categories.

In a few months, there might be better import tools Smile