Gossamer Forum
Home : Products : Gossamer Links : Version 1.x :

Still trying...

Quote Reply
Still trying...
I am still trying to parse the doggone ODP dump, but keep getting an error. Question? How do you break the ODP down into smaller parts without destroying the integrity of the file? Also how would you go about doing the same for the structures.rdf in a manner that the two broken down sets content.rdf 1, content.rdf 2, would correlate to the structure.rdf 1, and structure.rdf 2??
Or is this possible???



Trust in your elders, for they hold the key to life...
Quote Reply
Re: Still trying... In reply to
I think I posted this in another message.

I just used 'split' (a Unix utility/command) to chop the file up into 50 meg pieces. I then edit them separately. There is no way to just parse out the stuff you need. I've checked the net, and unless I missed something in the 200+ links I checked, there is nothing that will rip apart the ODP file the way people here (and you) want it to.

BTW.. I have the militar cuts, I just need to know how to get them to you. They are 2 Meg uncompressed.

Quote Reply
Re: Still trying... In reply to
Thanks pugdog,
You can e-mail them to me at ssgjones@bellsouth.net or if you have a place I could download them??
Can you run the split command on a virtual server via telnet??

Trust in your elders, for they hold the key to life...
Quote Reply
Re: Still trying... In reply to
I don't know. Just type 'which split' and see if you have access to it.
If not, you might be able to download a version of split from one of the Unix sites, and run it. I know there are perl versions of split around, and that would not be a problem. (Perl, after all, _IS_ a text processing language<G>).

Quote Reply
Re: Still trying... In reply to
Got the .rdf's, thanks alot!
Next problem....
I have a similar problem as a guy a few posts back... The program is inserting the links in a sort of random way. None of the categories have the correct links in them. Do I need to clip the portion of the structure.rdf that has these categories in them and load that in the same directory as the content.rdf??

Trust in your elders, for they hold the key to life...
Quote Reply
Re: Still trying... In reply to
Cool, I've been upgraded to Newbie!!

Trust in your elders, for they hold the key to life...
Quote Reply
Re: Still trying... In reply to
I don't know why I've never hit that problem.

What you need to do is make a path to the category.

This is the begining of what I sent (then cut out to the end of Military)

Code:
<Topic r:id="Top/Regional/North_America/United_States/Government/Military">
<catid>21094</catid>
<link r:resource="http://www.usrotc.com"/>
<link r:resource="http://www.defenselink.mil/"/>
<link r:resource="http://www.militarycity.com/"/>
<link r:resource="http://www.globemaster.de/start.html"/>
<link r:resource="http://www.infoplease.com/ipa/A0004597.html"/>
<link r:resource="http://www.dodea.osd.mil"/>
<link r:resource="http://www.grunts.net"/>
<link r:resource="http://searchmil.com/"/>
<link r:resource="http://usmilitary.about.com/"/>
<link r:resource="http://www.globemaster.de/govsearch/"/>
</Topic>
That means you _might_ need to set up a set of categories:

Code:
<Topic r:id="Top/Regional">
<catid>1</catid>
</Topic>
<Topic r:id="Top/Regional/North_America">
<catid>2</catid>
</Topic>
<Topic r:id="Top/Regional/North_America/United_States/Government">
<catid>3</catid>
</Topic>
And put them before the category. Or, you might need to make a partial tree on your site.

You'd only need to do this for the FIRST of the categories (ie: the one at the begining of the file I sent) since the rest of the categories would have paths attached.

I guess the way I got around it is I had a similar layout 1 or 2 levels deep already in my site, and it just meshed.

Let me know if that works.

I don't know why I haven't hit that problem, and I've imported a dozen or so sites like this. I wonder what I'm doing differently?



Quote Reply
Re: Still trying... In reply to
Okay,
Here's what worked for me...
If there are some others out there with the same problem, try this...

I first went through the .rdf (obviously this will only work with smaller directories, say 5000-10000 link range) and recreated the entire category tree into Links, i.e.

<Topic r:id="Top/Regional"> <catid>1</catid></Topic><Topic r:id="Top/Regional/North_America"> <catid>2</catid></Topic><Topic r:id="Top/Regional/North_America/United_States/Government"> <catid>3</catid></Topic>

I made into:
Regional
Regional/North_America
Regional/North_America/United_States
and so forth..

I then replaced the <catid>24568</catid> with the corresponding catid from links. Then run Parse_RDF.pl, worked like a charm. The only draw back would be with a larger .rdf file, say anything over 4-5mbs in size. Hope this helps anyone who had a similar problem. If you know of an easier way, let me know!!!

Trust in your elders, for they hold the key to life...
Quote Reply
Re: Still trying... Success! In reply to
I guess we should finally change the title of this thread to "success" <G>

Quote Reply
Re: Still trying... Success! In reply to
Roger that!
Thanks for all your help pugdog, it's greatly appreciated. Also to Alex for setting up the LinksSQL so quickly!
I don't suppose I could get you to parse military history at http://www.dmoz.org/Society/History/War/ for me anyone??
;)


Trust in your elders, for they hold the key to life...