Gossamer Forum
Quote Reply
DMOZ Wizard
I purchased the DMOZ Wizard quite a number of months ago to support DMOZ import for LinksSQL. Both worked fine. Recently I had a HDD failure in the web server and I needed to rebuild/install LinksSQL and DMOZ Wizard plugin.

I managed to get LinksSQL installed and running fine. But after the DMOZ Wizard plugin installation, I tried to "setup DMOZ job", then "start". Only 2 lines were shown:

* Blanked out dmoz_cron.cgi
* Cleaned out LinksSQL tables....

Then nothing else.

I looked at database/link status, total numer for the link is 0.

It seems that DMOZ Wizard did not manage to get the links from DMOZ.

Please....please help.

Thanks in advance.

Jong
Quote Reply
Re: [ccjong] DMOZ Wizard In reply to
Probably something simple. Did you check set the 'run full dump?' to 'yes' or 'no'?

Also, are you actually running dmoz_cron.cgi? (or setting it up on cron) ?

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] DMOZ Wizard In reply to
Hello Andy,

I set the 'run full dump?' to 'no'. I just wanted to dump "computer" category.

After I clicked "start" the 2 messages appeared:
Blanked out dmoz_cron.cgi
Cleaned out LinksSQL tables....

I was waiting for the commands to appear so that I could use SSH/telnet to proceed with the dump. But it just stopped there. And of course, I forget the command how to do the dump.

Please help.

Jong
Quote Reply
Re: [ccjong] DMOZ Wizard In reply to
I think I need to reword this part really. What it really means, is;

If you want to import one or more of the categories from the left, then you need to set to 'yes' (this sounds like what you should be doing).

If you want to import specific categories, i.e. Computers/Internet, then you enter this into the text-area box on the right, and then you need to set 'run full dump?' to 'no'.

Hope that helps :)

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] DMOZ Wizard In reply to
>> I think I need to reword this part really. What it really means, is;

<G> I felt the same way, and started to fix up some of the templates more to my way of thinking. I'll pass them on in a few days, unless you beat me to a new release of this.

I've been playing with this quite a bit, and I'm impressed with the speed.


PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.
Quote Reply
Re: [pugdog] DMOZ Wizard In reply to
Quote:
<G> I felt the same way, and started to fix up some of the templates more to my way of thinking. I'll pass them on in a few days, unless you beat me to a new release of this.

Go for it :)

Quote:
I've been playing with this quite a bit, and I'm impressed with the speed.

Which is why I wrote it in the first place :p Definatly a lot faster than running it *normally* through nph-import.cgi (even though it does use nph-import.cgi, but it tweaks the data).

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] DMOZ Wizard In reply to
This DMOZ Wizard plugin is brilliant. I like how easy it is to specify multiple categories (once I followed the instructions correctly). Thanks for the good work.

Rod

-----------------------------------------------
Yoga teachers forum / Ringtones for Australia, UK
Quote Reply
Re: [rh] DMOZ Wizard In reply to
Thanks Cool

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [pugdog] DMOZ Wizard In reply to
I started to rework this plugin, and will be making other changes. For now, the major changes (besides code formatting) are in the Setup_Job screen.

Attached is a jpg of the current screen.

Current users should be able to just copy the new .pm file over the old one, and get the new screens. I tried it out on a few sites, and it's working ok.

The new version will also move the parse/dump files into the admn/tmp directory (out of the admin folder) and the codes will move into a Plugins::DMOZ_Wizard directory, to help keep the core Links area uncluttered, and contain some code tweaks and fixes for readability/maintainability.

No new features.


PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.
Quote Reply
Re: [pugdog] DMOZ Wizard In reply to
Hi PugDog,

as a novice type of user I have a suggestion.

Put in bold red 'for advanced users only' the first line "Do you want to clean out the database first"

I "cleaned out the database first" literally thinking it would clean my database.. (of old/test links)

It doesn't actually mean that, it really should say do you want to reset your database and link modifications/columns to the fresh installation point and wipe all mod columns.

The help file says "

This is pretty obvious. Set to 'yes' if you want to wipe your database
before starting the import. This process is *NOT* reversible!"



I still interpreted that as wiping/cleaning up old links from the databse

Andy saved my bacon with this cause I would have rather pulled my teeth out with a spoon than redo all my columns.

I can laugh now but that is a powerful little button.

rgds
Kev

Cheers
KevM
Quote Reply
Re: [KevM] DMOZ Wizard In reply to
Yeah, point taken. I'm not a fan of the destructive stuff, but it was on the original screen, so I kept it in.

Because this is an option you are not going to use on a "live" site, what if we put it as a separate option on the left menu, you'd have to select, and it would come with a warning page before you could delete?

I'm not a fan of this being in the cron file to begin with, so would that work, or are there problems with that idea? Wiping the database takes only a short time, no matter what -- and like I said, you can't do it on a live site anyway.


PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.
Quote Reply
Re: [pugdog] DMOZ Wizard In reply to
Quote:
I'm not a fan of this being in the cron file to begin with, so would that work, or are there problems with that idea? Wiping the database takes only a short time, no matter what -- and like I said, you can't do it on a live site anyway.

Remember.. I wrote this plugin with myself in mind :) It was to speed up the large number of DMOZ imports that I was getting last year, and it certainly did that.

It started off with literally just the 'writing' facility, where it would cut up the data from the content.rdf.u8.gz file, and then import each category that you specified.

As I needed to do more with it, such as setting up crontabs automatically, verifying categories exist before an import, clean out the database (if you want to start from scratch), email notices when the import process has run for each category, and more. A lot of the above features were in there just for me... but I left them in there, just in case someone wanted/needed them in the future :)

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] DMOZ Wizard In reply to
Ok.

I wondered about the email, since most imports go pretty quickly.

I reworded the page, put the bad stuff in red, and the good option in green, and maybe that will make a difference until I can move it to it's own page.

I just have to rewrite the file routines and install, and make a clean up for previously installed versions, and it should be ready for re-release. Figure a few days.

Things are a bit hectic this weekend, so I'm not sure how much I'll get done.


PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.
Quote Reply
Re: [pugdog] DMOZ Wizard In reply to
Hiya;

A few comments. We're dying for a really good DMOZ import routine, so here are my comments and needs!

The biggest problem with DMOZ imports, is wiping out existing data.
To make the import truly useful, when a record is found that already exists, the webmaster should be given the option of adding, editing and adding, or skipping. If it's skipped from import, it should ideally be added to a list of skipped sites. For a new import, the checking could be skipped totally, user selectable.

As I have DMOZ entries that were later re-edited and reviewed, we don't want to loose our new descriptions!

This way, we could run a monthly DMOZ ipdate with no worry of overwriting existing information or changes to existing DMOZ sites.

You could even set a database field, listing the date of the DMOZ import on the record.

I would have bought this long ago, but it's 2nd use will wipe out the 1st use, so it was not usable at all to keep up to date with DMOZ..

The other thing is that the plugin could not be set for multiple DMOZ categories to be brought into different, individually specified LINKS categories. Has this changed?

regards...
Quote Reply
Re: [webslicer] DMOZ Wizard In reply to
Quote:
I would have bought this long ago, but it's 2nd use will wipe out the 1st use, so it was not usable at all to keep up to date with DMOZ..

Erm, are you selecting 'no' for 'clean out existing data' ?

It shouldn't wipe out the existing data (the whole point in the plugin, so you can addextra categories as and when you want).

I think what you need to know, is that this import script doesn't actually have its own import routines, but simply makes use of nph-import.cgi, and generates all the required slices, commands etc, to make the import run the fastest.

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [webslicer] DMOZ Wizard In reply to
I've been using this awhile, and it doesn't duplicate links in the same category. I think those links are just skipped. I've imported the same groups of links several times, with no change to the links count, and "skipped/missing" links added (well, the link count changed, but by a few links, not doubled).

As andy said, in not so many words, is that this is just an optimized, user-friendly shell around the nph-import.cgi script (which while utilitarian functional, is a bear to seed).

I've changed the screens to be more OBVIOUS that selecting the "clean" option will damage the existing database (see updated attached screen shot).

Actually, this has been updated once more, so that the NO is in bold, larger caps, and black-boxed.

Adding the DMOZ import date is a fairly trivial addition, if the contact information field is set to "DMOZ" then the script could execute an "update Links set DMOZ_Date to NOW() where Contact = 'DMOZ' and Mod_Date = Today"

You might need to also check the start/end time of the script an add a "Mod_Date = yesterday" command.

But, not extremely hard, if this is what you need.

Actually, the Contact field could be set to "DMOZ Import Date: 4/20/04" for example, and you could use a simple global test to check for the "DMOZ" in the Contact name, and display it as "Import Date" rather than "Contact_Name". That wouldn't even require a big mod, just a new template field.

Once you start using this plugin, you'll wonder how you got along without it :)


PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.
Quote Reply
Re: [pugdog] DMOZ Wizard In reply to
As I said, it needs to offer multiple choices when an existing record is found (and choices on how to figure that).

Just skipping the record all the time is not good, as there is no log file made of skipped records.

Just adding it will wipe out existing data.

Sometimes trivial additions make all the diference in the world.

If you can add in these "trivial" additions, then I will buy it. And I bet, so will other "almost" buyers.

Sure you can add in NEW categories, but the problem exists when the category or record is already there.

Sometimes I have to edit Titles, othertimes just the details/description.

The "Clean" could be made to NOT wipe out existing custom columns, could it not...
Quote Reply
Re: [webslicer] DMOZ Wizard In reply to
Quote:
If you can add in these "trivial" additions, then I will buy it. And I bet, so will other "almost" buyers.

I'm still not sure you understand how it works. Take a look at dmoz_cron.cgi .. all it does, is slice up the RDF file into smaller parts, email (if set), and define the correct import commands to run nph-import.cgi. We DON'T have our own import utility built into this, as it is all done by RDFS2.pm (/admin/Links/Import I think).

The command that calls nph-import.cgi passes along the --rdf-update string, which means that it will check each link, to see if it already exists in the same category, under the same name/URL. If it does, it will skip.

>>>The "Clean" could be made to NOT wipe out existing custom columns, could it not...
<<<

I agree on this one. The only reason it does this, is because I used the basic routines out the the LSQL database setup script, which obviously cleans out custom fields, etc. It should be pretty simple to do. Just take out the code that cleans this out, and then have a few delete_all() commands. I'm not sure if pugdog wants to take a look at doing this, or when he is finished with his modifications, I'll take a look into it.

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [webslicer] DMOZ Wizard In reply to
>> Just skipping the record all the time is not good, as there is no log file made of skipped records.
>> Just adding it will wipe out existing data.

I'm really missing something here. We are not looking at a spider seeking out new, unknown links. These are links that are already imported in your database, from DMOZ, and links not yet imported.

Skipped records already exist, and are already in your database, so what is the issue? You want to be notified that the record you got from DMOZ is in DMOZ? If you do a 5,000 record import, and there is one new record, you'll have 4,999 "skipped" records, that are pretty pointless.

If you are looking for changed records, that is a double edged sword. Also, DMOZ drops good links for all sorts of reasons, and keeps bad ones around way longer than they should. You are better running the "Validate Links" and keeping your own record of still-live sites. It will be more accurate than DMOZ which _rarely_ seems to prune dead links. At least not as often as it should.


I'm pretty hard on this kind of tool, and I've been beating it to death with loads of imports the past 2-3 weeks. I really don't see what you are asking for.

If you have something specific in mind, for your needs, maybe you need a custom job. But, I don't see what added functionality you are trying to get.


The only thing I can see in all this, is something I've wanted for pruning duplicate links.

1) the database runs, and checks if the link (URL) exists in the database. If it does, it checks the Title & Description to see if it matches.
2) if they match, it's skipped (in the duplicate database, it's deleted, after a cat_links addition). At most, keep a count of skipped links, there is _no_ point in doing anything else.
3) if they don't match, insert the link into a duplicates database.
4) if the link exists, but the category is different, a) ignore b) add a cat_link record if the category exists c) add to a suggestion database for adding a catlinks record, or creating a new title if the category doesn't.

This adds a slight bit of functionality that I just haven't been able to allocate the time for. The current tools allow you to do all this, just not in a simple/integrated manner.


PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.
Quote Reply
Re: [pugdog] DMOZ Wizard In reply to
Pudog;

1. It is very important to know when DMOZ data changes - descriptions, titles, etc.

2. Databases often START with DMOZ, but end up DMOZ + Edited DMOZ - REMOVED DMOZ + Other links. Therefore, the need as described in my previous posts.

It is not the typical database that stays 100% DMOZ.

Therefore your product is good only to start with, not to use on a regular basis - if anyone actually gets submissions, or edits descriptions and titles, that is.

Using it straight off would greatly change our databases. Sites we removed would show up again, etc.

Sometimes automated tools are more usable when combined with some manual review, decision and correction steps, and this is one case where that applies.
Quote Reply
Re: [webslicer] DMOZ Wizard In reply to
What you wants defeats the point of this plugin. This is *supposed* to be straight forward, simple, and AUTOMATED.

I can see the potential for another plugin called "DMOZ Updater" for instance, which would in essence be a spider, that simply took it's data from the DMOZ dump.

What you are describing, is the behaviour of the original GT spider. The second version was sort of different, and current versions are too difficult to set up (they need sample import filters, for a variety of things).

Actually, I wish I could locate the original GT Spider tar, as it had some really neat code in it that was sort of abandoned in later versions.

The *PROBLEM* with what you want to do, is that moving a link from one category to another, can either screw things up -- ie: duplicate insertions, or create *massive* problems in "suggesting" where the link should go.

Some sites will want to have unique URL's and SINGLE categories, some Unique URLS and multiple categories, and some will allow multiple URL's in different categories.

It depends on if the site is CATEGORY or LINK driven.

ie: if searches are #1, unique URL's are more valid, if the site is mostly browsed, proper categorization is more important.

DMOZ provides only 3 pieces of data in their dump:

Category
Title
Description

Creating such a list for matched/itemized import is not trivial, but not hard.

the problem is it's *NOT* scalable. This is why the original plugin was automated. On larger sites, the amount of comparasons would be daunting, and people would beg for automation. The automatiion is built in, and you are begging for manual system!

What size sites are you talking about ?? Over 300 links, starts to be a pain, over 1000 links starts to be a real pain, and 10,000 links is flat out unmanagable.

You are asking to page through several thousand links, when 90% of the posts on this sort of item complain about having to page through a few dozen, or a few hundred.

I'm thinking about this, but I still don't see the complete functionality, since especially for larger sites, unless existing+unchanged links are completely ignored, there is a data logging/pigging problem that starts to get insurmountable. It's sort of triplicating data -- one in DMOZ, one in your directory, now one in the "to be imported" area. This so violates database normalization, it's giving me hives <G>


PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.