Gossamer Forum
Home : Products : Gossamer Links : Discussions :

Dmoz Splice??

Quote Reply
Dmoz Splice??
Wink Hi,



I was just wondering how you guys are coming along with the dmoz dumps?? Has anyone had a chance to splice the "Sports" section out of the dmoz dump yet??

Keep me updated,

Al


Quote Reply
Re: [KidKilowatt] Dmoz Splice?? In reply to
Hi. I'm still in the process of getting that category sorted. I'l let youknow when i have it ready (unless someone already hs a recent slice done).

Thanks

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [KidKilowatt] Dmoz Splice?? In reply to
I have not done a recent slice of the Dmoz dump.

If I get my old machine back, I might dedicate it to doing just that -- and work on a way to dump out portions of the database in a format that can be imported into Links, even if that format is RDF.

From what I've seen of RDF, it's a fairly simple hierarchical tagging system, and computers are pretty good about doing things like that in nested loops.

The Links importer imports RDF pretty well, so that might be a way to go.

I've been thinking about this problem for well over a year, and have been limited by hardware, and time. I've been off work for the better part of 6 months as well, which makes money/time a major issue.

I've thought of things such as
  • maintaining the Dmoz directory, auto-pruning dead links, and importing new links as the updates come up.
  • doing "section" or "category" slices, dumps and pre-digested imports.
  • developing a "search-dump" program that allows you to search for links, mark them, and then get an import file for those links (combination of mylinks + dump)
  • pre-digested "niche" dumps for various types of sites using pre-done search lists.
  • URL Matching, so that YOU can contribute to the directory by offering up your database dump, and we can run it through the program, picking out URL's that don't exist in the current database, as well as comparing URL's that do for improved descriptions, keywords, etc. (You know I'm dreaming now, don't you???)


Anyway, many things are possible, it's a matter of time/money, more than anything else.




PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.
Quote Reply
Re: [KidKilowatt] Dmoz Splice?? In reply to
Hi. Just wondering if you had a specific category you wanted. Its just that the whole sports section will take approx. 4 days to complete. If you have a smaller sub category you want done it could be much faster.

Let me know what you decide (post a reply or email me on webmaster@ace-installer.com).

Thanks

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [AndyNewby] Dmoz Splice?? In reply to
I'm looking for an updated cut of just the Ohio section - under Regional/North America/U.S.

Do you have that available?

Thanks a lot.



Mark G. - OhioBiz
Quote Reply
Re: [mgeyman] Dmoz Splice?? In reply to
No, but I'll do it for you now if you want....what is your email address andwe can talk further.

Thanks

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [pugdog] Dmoz Splice?? In reply to
Pugdog,

What program did you use to split the rdf file into smaller junks?? I have found a program to edit it just fine, but I need to break it into smaller more manageable junks.



thanks,



AL
Quote Reply
Re: [KidKilowatt] Dmoz Splice?? In reply to
I think he was refering to nph-import.cgi, which comes with Links SQL 2.

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [KidKilowatt] Dmoz Splice?? In reply to
Okay, about this splice/category thing with the Dmoz rdf file. Here is how you do it:

1. Either use wget or lwp to download the file to your server. Will require approximately 200mb's of space.

2. Download it to your computer. I'm on a cable modem and it takes me about 10 minutes or less to download.

3. Unzip it using your favorite zip utility.

4. Go to www.vedit.com and download the trial version of the vedit program.

5. Install vedit on your computer and upload the content.rdf file. You will be able to load and edit the entire rdf file with vedit.

6. If you want a particular section, do a search for, let's say Top/Arts. Go to the file menu and set a line marker. Once you have that done, you can either page down or search for the next category, such as Top/Business, and the line marker will highlight everything from the beginning of your search to the end. Go to the file menu and save block as whateveryouwantittobe.rdf.

7. Upload this small rdf file to your server and import using Links. I can break down the entire content.rdf file into about 20 different categories in about an hour using this program.

NOTE: If you try to start a search from Top/Arts to the bottom of the file i.e. Top/World, the program will lock up. Go from Top/(category) to Top/(category) and it won't lock up on you or use the side navigation slider bar thingy in small increments. This is not a fault of the program, but a result of the sheer size of the file.



I regularly use this method and it is very easy to do once you get use to using the program and the search capabilities of the program. I currently have the current rdf file that I am parsing into links, and I took out the Top/Adult section in about 5 minutes.

Vedit, get it, use it, enjoy...

</not a clue>
Quote Reply
Re: [KidKilowatt] Dmoz Splice?? In reply to
I did it with EditPlus on my PC at first, then I modified a script I found on the forums here a bit, to do it.

You can probably still find the script in the Discussions forum (?) if you search for it.

The script cuts it into categories, then I edited the categories to put the headers back on them. I don't
have my edited script any more -- that's one thing I lost when my servers crashed. But, I remember the
idea, and this is what I alluded to in my earlier post.




PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.
Quote Reply
Re: [Kilroy] Dmoz Splice?? In reply to
This is what I used EditPlus for. You can also use Joe (joes own editor on Unix) to do it. But, you need a _lot_ of RAM and diskspace.

If you can find the script I mentioned, it's a good slicer.




PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.
Quote Reply
Re: [pugdog] Dmoz Splice?? In reply to
EditPlus crashes trying to open the rdf file. In any case that requires uploading and downloading unless you have Mandrake and a cable modem :)

Thats why there are programs like vi and tail :)

Last edited by:

RedRum: Dec 31, 2001, 9:45 AM
Quote Reply
Re: [RedRum] Dmoz Splice?? In reply to
I used Hjsplit to break it into 25 meg files and then edited it with Vedit. Worked fine.
Quote Reply
Re: [KidKilowatt] Dmoz Splice?? In reply to
...what Im saying is that not everyone has the ability to get the rdf onto their pc.
Quote Reply
Re: [RedRum] Dmoz Splice?? In reply to
joe runs on the host -- unix editor. I like _much_ better than vi or even Emacs, and my college helped develop that program while I was there....


PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.