Gossamer Forum
Home : General : Perl Programming :

RDF Format....

Quote Reply
RDF Format....
Hi. I have just been looking at DMOZ's RDF dumps. Recently I have been doingsome DMOZ Dumps with Links SQL. However, some of the categories towards thebottom of the file (remember it is 900Mb!) are taking up to 3 days to get to!(I'm only on a 233Mhz server). So, what I was trying to do was cut the main RDffile into about 13 files (1 for each category). That way there will be less workneeding to be done by the server as the file size is much smaller. The problem Iam encountering though is how to work out if I'm at the end of the maincategory. I was hoping that the main topics would end with something like</Topic r:id="Top/Arts"> (seeing as it starts with <Topicr:id="Top/Arts">), however this hope was not correct. Now i am looking forother ways to try and read through it and cut it up.

Basically all I need to work out (with your help Tongue) is how to locate wherethe main top categories start and end. Anyone got any ideas on how to do this?Unsure

Thanks

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [AndyNewby] RDF Format.... In reply to
how about installing Perl and a server on your pc? I gaurentee better results if you got a decent computer.

Anyway, what I've done in the past was loop through the content.rdf file and save every "x" lines to a different file. Then I'd do a short patch up job to fix broken categories, then parse each file.

you could, since you know what the main categories are, loop through the file line by line, and write each line to another file until the next main category is found, then start the next file.

--Philip
Links 2.0 moderator
Quote Reply
Re: [AndyNewby] RDF Format.... In reply to
Try looking in err I think its RDFS2.pm or something lol....for how GT do it.
Quote Reply
Re: [RedRum] RDF Format.... In reply to
Use vedit; See:
http://www.gossamer-threads.com/...string=vedit;#175481

I start out by putting setting a line marker, putting top/arts into the search field, and it will go to top/arts, highlighting everything in between. It will let you save it as YourNewName.rdf, and then undo the line markers and set a new line marker the line above top/arts and search for top/business and repeat the whole process. It takes me about 45 minutes to break down the entire dmoz.


</not a clue>

Last edited by:

Kilroy: Jan 9, 2002, 5:20 PM
Quote Reply
Re: [Kilroy] RDF Format.... In reply to
Thanks for that Kilroy and Philip. However, there is a bit of a problem withwhat you are asking. I'm on a 28.9k connection, 33.33k if I'm lucky. God knowshow long even the GZ file will take to download Tongue

Thanks Paul, I'll have a look at that. To be honest I didnt think of thatWink

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!